TextQuest - Software

Search



Word lists

A word list is a list of all character strings (these are mostly words) with their frequency that occur in a text. These are counted and sorted by alphabet in ascending order. Word lists are used for:

 

  • the building of categories for content and style analyses
  • the testing of correct orthography and transcription rules

Several options allow the use of a sort order table (file sort.def), case folding, and exclusion list (also called STOP-words) as well as restrictions based on frequency and/or length of the character strings. With a sort program one can sort the list by frequency also.

The algorithm TextQuest uses is very fast: an alphabetically sorted word list of a 24 MB file with 2.78 million character strings needs 34 seconds on an 800 MHz fast PC.