Click here to download MultiLingua, a natural language parser implemented with Perl. Machine requirements are the following:
This is a robust and partial parser. The implementation is based on finite-state cascades. It is a multilingual system. It processes 5 languages:
Installation provides you with the parameter files required by Tree-Tagger
./parser_ml.sh <tagger> <lang> <input_file>
tagger = freeling, treetagger
lang = gl, es, en, pt, fr
Note: if Freeling has not been installed, don't use flag 'freeling'.
The input file is just plain text. File codification must be ISO-8859-1.
The system parses sentence by sentence. Each parsed sentence consists of two elements:
(relation;head_lemma;dependent_lemma)
For instance, the sentence "I am a man." gives rise to:
SENT::<I_PN_0 be_VERBFCOP_1 a_DT_2 man_NOM_3>
(Lobj;be_VERBF_1;I_PN_0)
(Spec;man_NOM_3;a_DT_2)
(Robj;be_VERBF_1;man_NOM_3)
The set of dependency relationships used by the 5 grammars can be consulted in file DependencySet.txt. The 5 grammars share the same set of dependencies.
It is also possible to get an output file with the format defined by CoNLL-X, inspired by Lin (1998):
This format was adopted by the evaluation tasks defined in CoNLL.
To get this ouput format file, you have to run ./scripts/saidaCoNLL.perl taking as input the standard output file.
You can use the output of the parser to build a cooccurrences file. This file contains all coocurrences between lemmas and dependency contexts. It consists of 3 columns:
<context> <lemma> <frequency>
To generate this cooccurrence information, you have to run ./scripts/contextsDep.perl, taking as input the standard output file.