In this project, our objective is to measure the distance between languages using the notion of perplexity. An experiment to compare forty-four European languages has been performed. Distance is automatically computed on the basis of language models (n-grams) learnt from text corpora.
See here our map of distances between 44 Europe's languages with Cytoscape.
Search here for the distance between a target language and the rest of European languages.
The article describing this work has been published in the journal Physica A:
Gamallo, Pablo, José Ramom Pichel, Iñaki Alegria, 2017.
From language identification to language distance,
Physica A, Vol 484, pp. 162-172. DOI: 10.1016/j.physa.2017.05.011. ISSN: 0378-4371.
draft version - pdf
See also a press report in Galicia Confidencial
Researchers involved in the project:
Language Identification Perplexity N-Gram Models Language Distance Natural Language Processing Europe Languages