On-line Form
Software download

Click here to download MultiLingua, a natural language parser implemented with Perl. Machine requirements are the following:

Description

This is a robust and partial parser. The implementation is based on finite-state cascades. It is a multilingual system. It processes 5 languages:

How to install

Installation provides you with the parameter files required by Tree-Tagger

How to use

./parser_ml.sh <tagger> <lang> <input_file>

tagger = freeling, treetagger

lang = gl, es, en, pt, fr

Note: if Freeling has not been installed, don't use flag 'freeling'.

Input format

The input file is just plain text. File codification must be ISO-8859-1.

Standard output format

The system parses sentence by sentence. Each parsed sentence consists of two elements:

(relation;head_lemma;dependent_lemma)

For instance, the sentence "I am a man." gives rise to:

SENT::<I_PN_0 be_VERBFCOP_1 a_DT_2 man_NOM_3>

(Lobj;be_VERBF_1;I_PN_0)

(Spec;man_NOM_3;a_DT_2)

(Robj;be_VERBF_1;man_NOM_3)

The set of dependency relationships used by the 5 grammars can be consulted in file DependencySet.txt. The 5 grammars share the same set of dependencies.

CoNLL output file format

It is also possible to get an output file with the format defined by CoNLL-X, inspired by Lin (1998):

  • Lin, D., 1998. Dependency-based Evaluation of MINIPAR. In Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation. Granada, Spain. 12
  • This format was adopted by the evaluation tasks defined in CoNLL.

    To get this ouput format file, you have to run ./scripts/saidaCoNLL.perl taking as input the standard output file.

    Extensions (coocurrences file)

    You can use the output of the parser to build a cooccurrences file. This file contains all coocurrences between lemmas and dependency contexts. It consists of 3 columns:

    <context> <lemma> <frequency>

    To generate this cooccurrence information, you have to run ./scripts/contextsDep.perl, taking as input the standard output file.

    Última modificación: Viernes, 9 de mayo de 2008
    © Universidade de Santiago de Compostela