Accelerated command processing with GNU Parallel

Slashdot it! Delicious Share on Facebook Tweet! Digg!

Caught in the Net

If you are still not convinced about how awesome parallel processing is, you can also use GNU Parallel over networked computers. With the use of this feature, you can get another number cruncher to resize your vacation photos. To make this work, you need to access the remote computer via SSH without a password (e.g., ssh-agent [5]).

Moreover, Rsync needs to be installed along with GNU Parallel on the remote computer. Rsync is used for data transfer, whereas GNU Parallel determines the number of processors. For example,

$ parallel --sshlogin 192.168.1.11,192.168.1.12 'hostname; echo' ::: 1 2

checks to see whether GNU Parallel can access the computers with IP addresses 192.168.1.11 and 192.168.1.12. GNU Parallel logs in to both computers via SSH, starts the hostname program, and returns the results (Figure 6). The echo command provides a bit of help to GNU Parallel, in that hostname doesn't take any of the parameters that you place after the three colons.

Figure 6: The echo and the ending numbers help contact both of the specified servers.

The harmless echo simply outputs a number. The two numbers at the end are necessary so that GNU Parallel contacts both computers. With only a 1 at the end, the tool would run hostname; echo on a single computer.

You can use hostnames instead of IP addresses and separate them with commas. Prefix login names with @ before the computer name.

If your connections work, you can get a colleague's computer to participate in your photo processing with the somewhat long command in Listing 3. GNU Parallel grabs the next image file and copies it via the --transfer option to a computer, where it runs the mogrify -resize 50% job. It then returns the processed file (--return {} ) and deletes the copy on the remote computer with the --cleanup option.

Listing 3

Using Networked Computers

parallel --sshlogin hank@192.168.2.11,peter@192.168.2.12 --transfer --return {} --cleanup mogrify -resize 50% {} ::: *.tif

To make the command clear, Mogrify and GNU Parallel simply overwrite the local file. Because the --transfer , --return , and --cleanup options are used often, you can abbreviate them as --trc .

This kind of distributed computing can expose some limiting bottlenecks. Remote transfers, for example, can take longer than the processing itself. Distributed processing is, therefore, best used with extensive or time-consuming processing.

Conclusion

GNU Parallel comes into its own, particularly with computationally intensive processing requiring multiple, mutually independent subtasks. In Bash scripts, you can often speed up for and while loops with GNU Parallel, such as those used with the photos at the beginning of this article. A pleasant side effect is that, thanks to GNU Parallel, the commands are much easier to read.

This tool recognizes many other parameters and functions whose descriptions could fill a book. If you're working with GNU Parallel for the first time, a look at the examples in the manual [6] will help.

Incidentally, I cheated a bit in the first example. Mogrify doesn't need to be stuck in a for loop because it can process multiple files by itself. It even uses multiple processor cores. GNU Parallel is, therefore, not a magic bullet. You should always check in advance whether your command-line commands are already running on multiple cores.

Cheap Imitation

In Ubuntu and Ubuntu-based distros, you're likely to get a cryptic error message when you run the parallel command (Figure 2). To avoid a conflict with the Parallel from the moreutils package, you can have GNU Parallel behave like its same-named competitor. The Debian package builders switched on the compatibility mode by default. To use GNU Parallel as described here, you either have to use the --gnu option with parallel or remove the --tollef entry from the /etc/parallel/config file. For this article, to keep parameter clutter to the minimum, I will assume you have done the latter.

Figure 2: If this error message appears, GNU Parallel is running in the wrong mode.

Buy this article as PDF

Express-Checkout as PDF

Pages: 4

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content