Accelerated command processing with GNU Parallel

Slashdot it! Delicious Share on Facebook Tweet! Digg!
Lead Image © Ilka Burckhardt, Fotolia.com

Lead Image © Ilka Burckhardt, Fotolia.com

Multiple Personalities

With the snazzy little program GNU Parallel, you can make use of the full power of your multicore CPUs through scripts.

When you get back from vacation, you probably have tons of snapshots stored on your camera. If you want to reduce the resolution of photos so you can upload your pics to a web gallery, the following one-liner for Mogrify from the ImageMagick package is usually sufficient:

$ for i in *.tif; do mogrify -resize 50% $i; done

The command combs through all files with the .tif ending in the current directory (for i in *.tif ) and has Mogrify reduce their size by a half (mogrify -resize 50% ). Because the command processes the files sequentially, a modern core processor running at full speed is still basically twiddling its thumbs. It would be much more effective and faster to process multiple photos simultaneously. This is where the somewhat unjustly overlooked tool called GNU Parallel comes in.

Wrong Twin

Even though GNU Parallel has officially been part of the GNU [1] tool collection since 2010, it is seldom preinstalled and you still have to install it via the package manager. Many large distributions, such as openSUSE 12.2, are missing it altogether. To make matters worse, a program called Parallel is also part of the moreutils [2] package, but it has nothing in common with the GNU Parallel package presented here.

Thus, you need to pay special attention when grabbing the GNU Parallel package to make sure you get the correct one. In Ubuntu, the package is simply called parallel , and you can install it by entering

apt-get update
apt-get install parallel

from the command line.

If you are using something different from Ubuntu and can't find it in the package manager, you will have to compile it yourself. For that you'll need Make, a C compiler, and the current source code archive [3]. You can then unzip and install GNU Parallel using

./configure && make && make install

As soon as GNU Parallel is installed, test whether gnu.org and ubuntu-user.com are accessible:

$ parallel ping -c2 ::: gnu.org 192.168.1.102 ubuntu-user.com

This executes the ping -c2 command three times, once for gnu.org and once for ubuntu-user.com , but both programs run simultaneously. If ubuntu-user.com responds more quickly than the GNU server, you'll see that result in the output first (Figure 1).

Figure 1: The computer 192.168.1.102 in the local net responds to the ping more quickly than its colleague gnu.org on the Internet.

The three colons in the command belong to GNU Parallel and separate the command from its parameters. By the way, in GNU Parallel parlance, commands or programs are called "executable jobs" instead.

Retention Bucket

Normally, GNU Parallel collects all output from the program and shows it at the end of the execution. This approach has the advantage that the two aforementioned ping instances won't "talk over" each other, but you also won't see any intermediate results. However, you can use the -u option to show what's going on while the program is running (Figure  3).

Figure 3: The -u (--ungroup) option displays a mixture of output from jobs, which can be confusing, as you can see at the beginning of the example.

The -k (--keep-order ) option shows the GNU Parallel results in the exact order of processing. In the previous example, you would always see the ping -c2 gnu.org output first even though another ping is occurring faster. For a better oversight, use the -v (verbose) option to compel GNU Parallel to write the name of the executable job just before each output.

Buy this article as PDF

Express-Checkout as PDF

Pages: 4

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content