The spoken language identifier is a service that tries to determine the language spoken in an audio recording.
The model currently supports 8 languages: English, Spanish, Italian, French, German, Portuguese, Dutch, and Russian.
Supported audio formats: WAV, FLAC, OGG.
The model uses convolutional and recurrent neural networks trained on tens of hours of speech data. This is an end-to-end model that uses a raw waveform as input and makes no assumptions about the phonetics or the grammars of the languages considered. Rather, it tries to infer all the relevant features of the audio from the data. It produces the probability distribution over the languages recognized by the model as the output.
You can use it to classify recordings as short as 1 second and as long as a minute. Note that the longer the recording, the higher the accuracy of the prediction. For 20 second recordings the accuracy is about 95%, while for 5 second samples it is just over 80%
I could do better!
If you think you could improve these applications, if you are passionate about information retrieval, natural language processing, machine learning or artificial intelligence in general, you have come to the right place. Send us your CV.
I want it!
If this technology interests you, please have a look at our API available on Mashape