Changing Treetagger and Freeling PoS tags into a common tagset

The second process of the pipeline is to translate the PoS tags of Treetagger and Freeling into a new tagset interpretable by DepPattern parsers. As we used 8 PoS taggers, we need 8 'adapters':

To process a new language supported by either Treetagger or Freeling, we only need to create a new 'Adapter'. This is a very easy task provided that the tagset of the input PoS tagger is available. In addition, we also need the tagset required by DepPattern, which is available at 'docs/tutorialDepPattern.pdf'.

Le't see an example. The sentence 'I have a dream' is PoS tagged by 'tree-tagger-english' as follows:

I       PP      I
have    VBP     have
a       DT      a
dream   NN      dream
.       SENT    .

This tagged text is translated by AdapterTreetagger-en.perl into:

I       genre:0|lemma:I|number:0|person:0|politeness:0|possessor:0|tag:PRO|token:I|type:P|
have    genre:0|lemma:have|mode:0|number:0|person:0|tag:VERB|tense:0|token:have|type:A|
a       genre:0|lemma:a|number:0|person:0|possessor:0|tag:DT|token:a|type:0|
dream   genre:0|lemma:dream|number:S|person:3|tag:NOUN|token:dream|type:C|

This is the input format expected by any DepPattern parser.

On the other hand, if the sentence is tagged with freeling-en ('analyzer -f en.cfg'), then we obtain:

I i NN
have have VBP
a a DT
dream dream NN
. . Fp

This tagged text is translated by AdapterFreeling-en.perl into:

I       genre:0|lemma:i|number:S|person:3|tag:NOUN|token:I|type:C|
have    genre:0|lemma:have|mode:0|number:0|person:0|tag:VERB|tense:0|token:have|type:A|
a       genre:0|lemma:a|number:0|person:0|possessor:0|tag:DT|token:a|type:0|
dream   genre:0|lemma:dream|number:S|person:3|tag:NOUN|token:dream|type:C|

Pablo Gamallo 2009-10-02