Overview of commercial and free thesauri

Slashdot it! Delicious Share on Facebook Tweet! Digg!

Free Thesauri

Many thesaurus projects are available in free software. Although most are not as well known as their proprietary counterparts, they often manage to be as feature-rich. The aforementioned WordNet is the result of a research project by the same name from Princeton University. The institution has been working on an English lexicographical database for several decades.

The database groups nouns, verbs, adjectives, and adverbs at the semantic and lexical level. The project forms the basis for comparative linguistics and natural language processing, and is therefore the basis for several of the programs presented here.

In addition to a web-based interface, the current state of research for various platforms is also available (as an Ubuntu package [24], among others). This includes both a wn command-line program, as well as a graphical application called WordNet Browser.

With the query in Listing 3, you can gain insight into the synonyms for and meaning of the substantive "fair." The parameter -synsn stands for and selects synonym, whereas n does the same for substantives (English nouns). Using the command wnb , you can start the GUI program and type in the search box on the top left.

Listing 3

Synonyms for "fair"

01 $ wn fair -synsn
02
03 Synonyms/Hypernyms (Ordered by Estimated Frequency) of noun fair
04
05 4 senses of fair
06
07 Sense 1
08 carnival, fair, funfair
09        => show
10
11 Sense 2
12 fair
13        => gathering, assemblage
14
15 Sense 3
16 fair
17        => exhibition, exposition, expo
18
19 Sense 4
20 bazaar, fair
21        => sale, cut-rate sale, sales event

Below the input box, four buttons appear that display the respective available word form. To restrict the list on synonyms for nouns, click the "Noun" button and select "Synonyms, ordered by estimated frequency" from the list. The result (Figure 4) is identical to the output on the command line.

Figure 4: Query for the term "fair" in the WordNet browser.

Several implementations exist for WordNet and are listed on the project website. To use Perl, it is best to use the WordNet-QueryData module [25], which is available as an Ubuntu package libwordnet-querydata-perl .

For Python, the Python Natural Language Toolkit (NLTK) is a good choice [26]. The latter provides a suitable parsing class for WordNet.

Kthesaurus

Kthesaurus (Figure 5) provides similar functions for Calligra-Suite (formally KOffice) as OpenThesaurus does for LibreOffice.

Figure 5: Kthesaurus obtains its data from the WordNet, which enables it to function only with English words.

The lexical information gets extracted from the WordNet databank. Because of this, Kthesaurus is only available in English. To use the software, install the package for your distribution.

In the box in the top left, first enter the word you want and scroll over the Search button to search within the database. Then, under the Thesaurus tab, you will see three columns filled with synonyms (column 1), hypernyms (column 2), and hyponyms (column 3).

The Replace button replaces the word in the text with the selections (note that this is only possible if you have activated Kthesaurus from within Calligra-Office). You can change the search vocabulary by selecting one of the entries from the columns with a double click. By using tabs, you can switch between the original search and the entry from the WordNet databank.

Figure 6 displays the overview for the word "help" and is sorted according to the average frequency. Use the drop-down box to get more information in accordance with the way these are stored in the WordNet databank (i.e., by compound words, synonyms, antonyms, and everyday words).

Figure 6: Search results for "help" from the WordNet database using Kthesaurus.

Buy this article as PDF

Express-Checkout as PDF

Pages: 5

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content