Analyser (-a)

Option -a means that the dp.sh generates a file with a dependency-based analysis. Each analysed sentence consists of two elements:

1. a line containing the POS tagged lemmas of the sentence. This line begins with the tag SENT. The set of tags used here are listed in file TagSet.txt. All lemmas are identified by means of a position number from 1 to N, where N is the size of the sentence.

2. All dependency triplets identified by the grammar. A triplet consists of:

(relation;head_lemma;dependent_lemma)

For instance, the sentence "I am a man." generates the following output:

SENT::<I_PRO_0_<number:0|lemma:I|possessor:0|case:0|genre:0|person:0|politeness:0|type:P|token:I|> 
am_VERB_1_<number:0|mode:0|lemma:be|genre:0|tense:0|person:0|type:S|token:am|> 
a_DT_2_<number:0|lemma:a|possessor:0|genre:0|person:0|type:0|token:a|> 
man_NOUN_3_<number:S|lemma:man|genre:0|person:3|type:C|token:man|> ._SENT>
(SubjL;be_VERBF_1;I_PN_0)
(SpecL;man_NOM_3;a_DT_2)
(DobjR;be_VERBF_1;man_NOM_3)

The set of dependency relationships used by the 5 grammars can be consulted and modified in the corresponding configuration file: src/dependencies.conf. Morpho-syntactic information is provided by a POS tagger, either tree-tagger or freeling.



Pablo Gamallo 2009-10-02