Using DepPattern to Correct the PoS Tagged Input Text
DepPattern is provided with tools suited to correct errors of the input PoS tagged text. DepPattern allows a linguist to elabore syntactic rules in order to correct systematic mistakes made by the PoS tagger. For this purpose, we are provided with 3 new elements:
- A new type of dependency, ``Head'', which represents a unary relation (arity 1). In the default configuration file, dependencies.conf, we declared one type unary relation, called ``Single''.
- A new operation, ``Corr'', whose aim is to correct all information associated to a lexical unit: type of PoS tag and morpho-syntactic features. It is similar to the operation ``Add''. The main difference is that ``Corr'' allows to change the PoS tag itself.
- A new output format obtained using flag -c. Instead of generating as output the dependency triplets identified by the grammar (flag -a), we can use flag -c to rewrite the same input, but containing all corrections made by operations such as ``Corr'', or ``Inherit'', or ``Add''.
Let's see an example. Suppose that the PoS tagger systematically tag as a subordinate conjunction the word that following a noun, even if in this context that is, in general, a relative pronoun. To solve the problem, we can write a rule as follows:
Single : [NOUN] CONJ<lemma:that&type:S>
Corr: tag:PRO, type:R
%
This way, the information introduced by the operator ``Corr'' is used to change the head expression of the unary relation ``Single''. It substitutes tag PRO and type R for the information contained in the head (tag CONJ and type S). More precisely, this rule identifies as head a subordinate conjunction with lemma that following a noun (its context), and transform this head entry into a relative pronoun. Notice that there there is no dependent expression involved in the rule, since the relation type of ``Single'' is Head.
Pablo Gamallo
2009-09-14