Controlling a computer with speech

Slashdot it! Delicious Share on Facebook Tweet! Digg!
© nem youth -

© nem youth -

Listen to Me

The Blather, FreeSpeech, Palaver, Simon, and Vedics speech recognition programs are ready to respond to voice commands. This sounds good in theory, but there are some pitfalls in practice.

A strong "Start browser!" belted into the microphone will start Firefox – at least, that's what the five leading free speech recognition programs (Blather, FreeSpeech, Palaver, Simon, and Vedics) promise. With that, they want to make input easier and also help disabled individuals better operate the desktop.

Four of these vendors – Vedics being the exception – allow you to decide for yourself what command triggers an action. A "Start browser!" could conceivably be used to open a text editor – confusing, yet possible.

The five applications do not analyze speech patterns themselves; they leave that task to other software. As a rule PocketSphinx [1] from Carnegie Mellon University (CMU) is the "other" software used.

The applications generally refer to such analysis assistance as back ends or engines. Blather, FreeSpeech, Palaver, and Vedics are under the GNU GPLv3 license, whereas Simon still uses the older version 2.


Blather [1] is programmed in Python and to get it to work, you must install the PocketSphinx archive in the package manager, along with the Python Gstreamer and Python GTK (in Ubuntu, python-gtk2 , python-gst0.10 , and pocketsphinx ). If PocketSphinx isn't part of your distribution, follow the instructions in the "Three-Step Sphinx" box.

Three-Step Sphinx

To begin, integrate the Bison package and, when appropriate, Perl. From the web [4], download the sphinxbase , pocketsphinx , and sphinxtrain packages. Unzip them and install them using the usual three-step procedure in Listing 1, where you start with the base package.

Listing 1

Installation Steps

$ ./configure
$ make
$ sudo make install

From Gitorious [2], download the current development version of Blather. After unzipping the archive, rename the file commands.tmp in commands and use a text editor to enter the desired English-language commands. Begin each line with an uppercase letter followed by a colon and the executable shell command.

Next, create the ~/.config/blather directory, copy the commands file into it and run ./ from the Blather directory. When the program seems to crash, end it with Ctrl+C. Then, upload the ~/.config/blather/sentences.corpus file to the Sphinx Knowledge Base Tools website [3].

After clicking Compile Knowledge Base on the website, save the generated file with the .lm extension under the name lm , and the file with the .dic extension under the name dic in the ~/.config/blather/language directory. You can then start Blather in its directory with ./ -i g .

The program displays a very clear main window (Figure 1). After you click Listen , it waits for a speech command through the mic. Alternatively, you can switch to Continuous mode in which the program listens continuously. There are no further functions. The degree of speech recognition is marginally acceptable.

Figure 1: The Blather main window provides only starting and stopping for the voice control.


Unlike the other four programs, FreeSpeech, also written in Python, is generally a dictation device. After startup, it opens a simple text editor where all the words spoken into the mic are written. Special language commands allow subsequent editing. Thus, an editor clear command deletes all the text previously interpreted.

A window appearing after startup shows all the available commands (Figure 2). Here you can modify a command by double-clicking it. As of version 120, FreeSpeech provides the option to control other programs with a virtual keyboard. Click the Send keys button in the text editor and speak the key combination into the mic.

Figure 2: After startup, FreeSpeech shows all the commands you can use to manipulate the dictated text and even save it as a text file.

FreeSpeech interprets exclusively English words, despite that fact the degree of detection is not particularly good. In our case, the PocketSphinx background process interpreted a clearly spoken "Hello World" curiously as "An over To open" (Figure 3). The second try yielded "An adult wall."

Figure 3: FreeSpeech interpreted a clearly spoken "Hello World" in a creative way.

According to the documentation, you can improve the recognition rate by correcting the failed text in the editor and clicking Learn . My test unfortunately produced a number of error messages in the process. Also, controlling other programs didn't work, and input ended up garbled in the editor.

To put FreeSpeech into operation, you need to integrate Python-Gtk2, Python-Xlib, Python-Simplejson, Python-Gstreamer, PocketSphinx, and Sphinxbase from the package manager. In Ubuntu, these are in the python-xlib , python-simplejson , python-gtk2 , python-gst0.10 , python-pocketsphinx , and gstreamer0.10-pocketsphinx packages. Again, if PocketSphinx isn't in your distribution's repository, follow the instructions in the "Three-Step Sphinx" box.

Download the PocketSphinx archive from the web [4] and unzip it on the hard drive. Open the Makefile from the CMU-Cam_Toolkit_v2/src subdirectory in the text editor and remove the hash mark (# ) at the beginning of the following line:


After saving, open a terminal, change to the CMU-Cam_Toolkit_v2/src subdirectory and execute make install . Then, copy the programs created in the folder to a directory included in the $PATH environment variable, such as /usr/local/bin .

Also from the web [5], download the FreeSpeech archive (pay attention to the ReleaseDate ). Unzip the archive on the hard drive and start the software using python in the created directory.

Buy this article as PDF

Express-Checkout as PDF

Pages: 5

Price $0.99
(incl. VAT)

Buy Ubuntu User

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content