On-line Form (dependency parser)
Software download

Click here to download Lingua Toolkit, a natural language kit containing two tools: a dependency parser and a thesaurus generator, both implemented with Perl. This kit also provides you with Tree-tagger, a POS tagger engine. The computational requirements are:

Description

This package contains two NLP tools: a multilingual parser (MultiLingua) and a thesaurus generator (AutoThesaurus). It takes as input a plain text file and gives as result a thesaurus, where each word is assigned its top-N most similar words. It works on 5 languages:

How to install

This is the installation of the whole kit: MultiLingua and AutoThesaurus. The installation also includes the POS tagger Tree-Tagger

If you wish, you can install separately only one of the two tools: either MultiLingua or AutoThesaurus.

How to use

./lingua.sh <tagger> <lang> <input_file> [TOP]

tagger = freeling, treetagger

lang = gl, es, en, pt, fr

TOP = 1..N

This script requires 4 arguments to be executed: the name of a POS tagger (either treetagger or freeling), the abreviation of a language (en, es, gl, pt, or fr), the input file, and the top-N similar words we want to be selected for each word. For instance:

./lingua.sh treetagger gl input_file.txt 5

The last argument TOP is optional. The by default value of TOP is 10.

Note: if Freeling has not been installed, don't use flag 'freeling'.

Input File

The input file is just plain text. File codification must be ISO-8859-1.

Output Files
Similarity Measures

LinguaToolkit computes 11 different similarity measures using a parsing method to build the cooccurrences file. The 11 measures are the following:

baseline, diceBin, diceMin, jaccard, cosineBin, cosine, cityblock, euclidean, js, lin, jaccardMax

This way, you can see and compare results in order to select the best measure to the specific task of computing word similarity. In our previous experiments, the best measures turned out to be diceMin,  jaccardMax, diceBin, jaccardBin, and cosineBin.

Última modificación: Viernes, 9 de mayo de 2008
© Universidade de Santiago de Compostela