| TextQuest - word list |
A word list is a list of all character strings (these are mostly words) with their frequency that occur in a text. These are counted and sorted by alphabet in ascending order. Word lists are used for:
Several options allow the use of a sort order table (file sort.def), case folding, and exclusion list (also called STOP-words) as well as restrictions based on frequency and/or length of the character strings. With a sort program one can sort the list by frequency also.
The algorithm TextQuest uses is very fast: an alphabetically sorted word list of a 24 MB file with 2.78 million character strings needs 34 seconds on an 800 MHz fast PC.
last change of this page: May 23, 2006