Automatic language identifier, 102 languages supported

Type or paste text to identify the language.

Pick a random language

Supported languages:

  • Abkhazian
  • Afrikaans
  • Albanian
  • Arabic
  • Azerbaijani
  • Bosnian
  • Breton
  • Bulgarian
  • Chamorro
  • Chinese
  • Corsican
  • Croatian
  • Czech
  • Danish
  • Dutch
  • English
  • Esperanto
  • Estonian
  • Ewe
  • Faroese
  • Fijian
  • Finnish
  • French
  • Frisian
  • Georgian
  • German
  • Greek
  • Haitian
  • Hausa
  • Hebrew
  • Hindi
  • Hungarian
  • Icelandic
  • Igbo
  • Indonesian
  • Irish
  • Italian
  • Japanese
  • Javanese
  • Kanuri
  • Kashmiri
  • Kazakh
  • Kirghiz
  • Kongo
  • Korean
  • Kurdish
  • Latin
  • Latvian
  • Lingala
  • Lithuanian
  • Luba-Katanga
  • Macedonian
  • Malagasy
  • Malay
  • Maltese
  • Marshallese
  • Navajo
  • Neapolitan (Italian dialect)
  • Nepali
  • Norwegian
  • Persian
  • Polish
  • Portuguese
  • Pushto
  • Quechua
  • Romanian
  • Rundi
  • Russian
  • Samoan
  • Sango
  • Sanskrit
  • Sardinian
  • Serbian
  • Shona
  • Slovak
  • Slovenian
  • Somali
  • Spanish
  • Sundanese
  • Swahili
  • Swedish
  • Tagalog
  • Tahitian
  • Tamil
  • Thai
  • Turkish
  • Ukrainian
  • Urdu
  • Uzbek
  • Vietnamese
  • Walloon
  • Welsh
  • Xhosa
  • Yoruba
  • Zulu

Information on the language identifier

Introduction

A language identifier is an automatic classifier. It calculates the similarity of a text with previously inserted reference texts.

Why have we developed this?

This technology is an integral part of a spider able to extract useful information for our translators from the web.

As an automatic classifier, it can easily be used to say in which category a document belongs by providing example documents. For this, we are also using it to classify our correspondence and to identify the topic of a written text in a language we do not understand.

Technology

It creates an n-dimensional representation of the text (Vector Space Model) by using the statistical properties of the byte sequences found in the text as coordinates. It performs the same operation on previously inserted reference texts. In the n-dimensional space, the inserted text will have a precise position. The reference text closest to it will be the one which most resembles it.

I could do better!

If you think you could improve these applications, if you are passionate about information retrieval, natural language processing, machine learning or artificial intelligence in general, you have come to the right place. Send us your CV