This application searches for semantic relationships in a text by analysing the statistical properties of words.
It is not based on rules but on the probability that two words can appear in the same phrase without having a relationship.
This technology is an integral part of a more complicated project able to extract translated terminology from the web.
For example, if you want to find on the web the English translation of ‘Metallizzazione’, it will be difficult to find bilingual sites from where the information can be extracted. But you will find on google 45.600 Italian pages which talk about "Metallizzazione". From these pages, you will discover that ‘Metallazzione’ has semantic relationships with "vuoto", "impianto", "vernice", "finitura", "metallo" for which the English translations can be easily found. At this point, you can search for what the words "vacuum", "plant", "paint", "metal" have in common and the answer will be "Metallization", the translation you were looking for!.
It creates an n-dimensional representation of words (PLSA) by using the statistical properties of the words which appear next to them as coordinates. This demo uses European Parliament debates as its corpus.
If this technology interests you, please read more on Translated Labs and our services for natural language processing.
If you think you could improve these applications, if you are passionate about information retrieval, natural language processing, machine learning or artificial intelligence in general, you have come to the right place. Send us your CV