- introduction
- picture
- Boutique
- Related
Developer's DescriptionBy Sergey LogichevProcess texts with UTF-8 encoding and create index of word elements.WordTabulator is intended for text analysis. With help of wordTabulator you can generate index of word elements extracted from defined text set. Word elements may be words, N-grams (of defined size) or phrases (syntagmes). The program can process texts as in ordinary 2-bytes encoding (ANSI), as in multibyte UTF-8 encoding. Source texts are defined as a set of flat text files or HTML/XML/SGML documents. In the last case the program can filter content from markup. Moreover, you can process only defined content within selected paired tags. Or you can skip that content from processing. It includes morphology module for Russian, three different formats of output index, three different types of word elements (words, N-Grams and phrases), browser of context, and true alphabetical ordering.