Terminology is the sum of the terms which identify a specific topic. Extracting terminology is the process of extracting terminology from a text.
It uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. It uses a probabilistic part of speech tagger to take into account the probability that a particular sequence could be a term. It creates n-grams of words by minimizing the relative entropy.
Translated has developed this technology to help its translators to be aware of the difficulties in a document and to simplify the process of creating glossaries.
We also use it to improve search results in traditional search engines (es. Google) by giving a better estimation of how much a keyword is relevant to a document.