BILINGUAL DICTIONARIES FROM NON PARALLEL, COMPARABLE CORPORA

Using an unsupervised learning strategy, we can build bilingual lexicons from different source dictionaries and non parallel corpora. No manual correction was made. The lexicon entries are either single lemmas or combinations of them. The lemmas belong to three categories: Nouns, Verbs, and Adjectives.

- english-galician dictionary : about 12,000 entriesSource bilingual dictionaries: Opentrad/Apertium (galician-spanish, english-spanish) and Collins (english-spanish).

- english-portuguese dictionary : about 7,500 entriesSource bilingual dictionaries: Opentrad/Apertium (portuguese-spanish, english-spanish) and Collins (english-spanish).

References:

Gamallo P. (2010) "Automatic Generation of Bilingual Dictionaries Using Intermediary Languages and Comparable Corpora ", Lecture Notes in Computer Science, vol. 6008, Springer-Verlag, (473-483). ISNN: 0302-9743.

homepage


We also used a strategy based on cognates and comparable corpora (Wikipedia) to build bilingual terminologies:

- spanish-portuguese terminology : about 28,000 entries

Gamallo, P., Garcia, M. 2012. Extraction of Bilingual Cognates from Wikipedia, In Helena Caseli, Aline Villavicencio, António Teixeira and Fernando Perdigão (eds.): PROPOR 2012, Computational Processing of the Portuguese Language. Lecture Notes in Artificial Intelligence, 7243. Berlin: Springer-Verlag: 63-72.