QueLingua

FREE LANGUAGE IDENTIFIER
ON-LINE FORM

return to homepage

Click here (also in github) to download a free automatic language identifier (GPL Lincense) which is able to guess (up to now) 7 languages or varieties: English (en), Spanish (es), Galician-RAG (gl), Galician-AGAL (gz), French (fr), Catalan (ca), and Portuguese (pt). The main programs were implemented in PERL. Last update in April 2013.

Requeriments
Any GNU/Linux distribution

How To Install
(1)
> tar  xzvf  QueLingua.tgz

(2)
> sh install-quelingua.sh

How To Use

> ./quelingua <FILE>
or
> cat <FILE> | ./quelingua

      file=path of the input file

Input File
The input file must be in plain text format and codified in UTF-8


ENHENCEMENTS

(1) LEXICON BUILDER:
The user can build new dictionaries in other languages and add them to the repository of lexicons. The output must be saved in ./lexicons.

How To Use

> ./LexiconBuilder <MAX> <FILE>
or
> cat <FILE> | ./LexiconBuilder <MAX>

        MAX=size of the lexicon
        file=path of the file input

(2) MORPHOLOGICAL INFORMATION
The user can edit files with morphological information. Up to now, it is only possible to write productive suffixes in file './morpho/suffix.txt'. The format of this file consists of 2 columns (suffix \t language). For instance:

çom     gz
ção     pt

This means that the system takes into account -çom suffixes to enhance the weight of galician-AGAL (gz) language. The same for -ção suffixes and Portuguese (pt) language. The two columns are separated by tabulation (\t).

Contributors:
Thanks to Óscar Senra (Imaxin|Software) by providing us the galician-AGAL corpus we use  to build the corresponding lexicon.

Contact:
pablo.gamalloATusc.es