When a user enters a search term into a search engine, the text is thoroughly analyzed: every word, phrase, and sentence is detected and processed by special software. This analysis must be done before a search can be performed. However, it is also a prerequisite for other applications like automatic summarization and translation. List’s language analysis engine, LIMA, is already widely used. Integrating deep learning modules into the software has made it even more powerful.
The latest advances in neural networks and a corpus of annotated texts in different languages provided by the Universal Dependencies cooperative were used to make the software more efficient, expand the number of languages supported, and add three learning modules. The first module segments text into words and sentences; the second performs a morphological, lexical, and syntactic analysis, and the third annotates the named entities identified.
The previous version of LIMA could analyze six languages (English, French, German, Spanish, Portuguese, Chinese, and Arabic). The new version, called Deep LIMA, can analyze more than 60 languages with performance at the state of the art.
An international cooperative project to create treebanks of the world’s languages (https://universaldependencies.org/).
Read article at http://www.cea-tech.fr/