Logo da USCProjecto Extralex

CorpusPedia

CorpusPedia is a corpus generated with CorpusPedia software. Both the Corpus Software and the Corpus File are freely available, you can choose to create your corpus with CorpusPedia software or download the files directly. The license of the corpus is the same as the Wikipedia: Creative Commons Attribution-ShareAlike 3.0 License (CC-BY-SA).

The corpus contains all the Wikipedia articles organized in several fields: title, plaintext, text in wiki format, category, links to other articles, related articles, and links to the same article in other languages (interlanguage link). The format of the CorpusPedia is a xml file.

Download the CorpusPedia files following the links below. CorpusPedia files are very large, be patient in the download process and please do not abuse with unnecesary downloads.

Valid HTML 4.01 Strict Valid CSS!