Logo da USCProjecto Extralex

Language Distance

Abstract

In this project, our objective is to measure the distance between languages using the notion of perplexity. An experiment to compare forty-four European languages has been performed. Distance is automatically computed on the basis of language models (n-grams) learnt from text corpora.


A Map of Distances Between European Languages

See here our map of distances between 44 Europe's languages with Cytoscape.

Distance Explorer

Search here for the distance between a target language and the rest of European languages.

Publications

The article describing this work has been published in the journal Physica A:

Gamallo, Pablo, José Ramom Pichel, Iñaki Alegria, 2017. From language identification to language distance, Physica A, Vol 484, pp. 162-172. DOI: 10.1016/j.physa.2017.05.011. ISSN: 0378-4371. draft version - pdf
See also a press report in Galicia Confidencial

Resources and Code

Participants

Researchers involved in the project:


Language Identification Perplexity N-Gram Models Language Distance Natural Language Processing Europe Languages

Valid HTML 4.01 Strict Valid CSS!