Word length and phrase length influence the ease of reading and understanding of a given text. Short words are usually more common (Zipf's law). Short sentences require less abstraction ability to understand. The readability analysis could be useful to make a text better, augmenting its accessibility.
The readability index tells us how easy a given text is to understand. A well-written text is effective, easy to understand and quick to read. This index helps us understand the text's complexity in order to better schedule the activities of translators and revisers. More than ever, written information, especially in the Internet, must be direct and well structured. This analysis can help achieve both goals.
Readability is calculated using the Gulpease Index. This index has been implemented for Italian, English and French. For German and Spanish, only the readability index works. If your language is not yet supported and you are interested in this technology, contact us via email.
It uses Poisson statistics, the Maximum Likelihood Estimation and Inverse Document Frequency between the frequency of words in a given document and a generic corpus of 100 million words per language. It uses a probabilistic part-of-speech tagger to take into account the probability that a particular sequence of words could be a term. It creates n-grams of words by minimizing the relative entropy. For more information see Terminology Extraction.
If you think you could improve these applications, and if you are passionate about information retrieval, natural language processing, machine learning or artificial intelligence in general, you have come to the right place. Send us your CV