The second process of the pipeline is to translate the PoS tags of Treetagger and Freeling into a new tagset interpretable by DepPattern parsers. As we used 8 PoS taggers, we need 8 'adapters':
To process a new language supported by either Treetagger or Freeling, we only need to create a new 'Adapter'. This is a very easy task provided that the tagset of the input PoS tagger is available. In addition, we also need the tagset required by DepPattern, which is available at 'docs/tutorialDepPattern.pdf'.
Le't see an example. The sentence 'I have a dream' is PoS tagged by 'tree-tagger-english' as follows:
I PP I have VBP have a DT a dream NN dream . SENT .
This tagged text is translated by AdapterTreetagger-en.perl into:
I genre:0|lemma:I|number:0|person:0|politeness:0|possessor:0|tag:PRO|token:I|type:P| have genre:0|lemma:have|mode:0|number:0|person:0|tag:VERB|tense:0|token:have|type:A| a genre:0|lemma:a|number:0|person:0|possessor:0|tag:DT|token:a|type:0| dream genre:0|lemma:dream|number:S|person:3|tag:NOUN|token:dream|type:C|
This is the input format expected by any DepPattern parser.
On the other hand, if the sentence is tagged with freeling-en ('analyzer -f en.cfg'), then we obtain:
I i NN have have VBP a a DT dream dream NN . . Fp
This tagged text is translated by AdapterFreeling-en.perl into:
I genre:0|lemma:i|number:S|person:3|tag:NOUN|token:I|type:C| have genre:0|lemma:have|mode:0|number:0|person:0|tag:VERB|tense:0|token:have|type:A| a genre:0|lemma:a|number:0|person:0|possessor:0|tag:DT|token:a|type:0| dream genre:0|lemma:dream|number:S|person:3|tag:NOUN|token:dream|type:C|
Pablo Gamallo 2009-10-02