Preparing output for further processing with Xargs

lucadp, 123RF

lucadp, 123RF

Bit by Bit

Special characters in filenames often cause problems for shell tools. With Xargs, you can circumvent these problems.

Complex commands on the command line often require applying the output as input for another command. A typical example of this is the find command, which lists recursively all the parameters of corresponding files. The command in Listing 1 finds all Ogg Vorbis files in the Podcasts folder.

Listing 1

Finding OGG Files

$ find Podcasts -name '*.ogg'
Podcasts/dh-20130204-ver-045.ogg
Podcasts/dh-20130107-ver-044.ogg

If you want to know whether all the files found are exclusively Ogg Vorbis files, you can use the file command, which uses a filename as a parameter. In the example shown, there are only two files, but if there were more, doing things by hand would be quite a lot of work. That is why you have backticks (` ).

The output of a command between backticks then becomes the parameter for another external command. Consider Listing 2, for example.

Listing 2

Finding and Identifying OGG Files

$ file `find Podcasts -name '*.ogg'`
Podcasts/dh-20130204-ver-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
Podcasts/dh-20130107-ver-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Stumbling Blocks

So far, so good. The function trims the output of the inner command of all spaces, newlines, and tabs. But, what happens when find finds a file with spaces in its name? The answer is shown in Listing 3.

Listing 3

Finding Space in a File Name

$ file `find Podcasts -name '*.ogg'`
01 $ find Music -name '*.ogg'
02 Music/Roxette - June Afternoon.ogg
03 Music/Roxette - I Don't Want to Get Hurt.ogg
04 $ file `find Music -name '*.ogg'`
05 Music/Roxette: ERROR: cannot open `Music/Roxette' (No such file or directory)
06 ^C
Podcasts/dh-20130107-ver-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Several things are happening here. The shell split the filename at the spaces into separate parameters. It even took the hyphen character by itself. The file command expects a standard input to analyze the parameters and runs into countless errors. The only way to stop them is with the Ctrl+C as indicated in the example.

An additional stumbling block emerges when nesting backticks. In practice, this occurs when you first list all the directories you want to search. Listing 4 shows a command that searches for all directories named Podcasts and includes the nested backticks.

Listing 4

Nested Backticks

01 $ file `find \`find . -name Podcasts -type d\` -name '*.ogg'`
02 ./lang/Podcasts/dh-20130204-ausgabe-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
03 ./lang/Podcasts/dh-20130107-ausgabe-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
04 ./kurz/Podcasts/dh-20121015-kurz-018.ogg:    Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Backticks allow nesting, but the nested backticks have to be escaped with backslash to prevent the shell from interpreting it as the ending element for the outer backtick. One level deeper and things get quickly out of control: Then the backslash requires a second escaping backslash.

The quick solution is to use the parentheses ($() ) construct rather than backticks because the open and close parentheses make it easier to visually group things than do undifferentiated backticks. See Listing 5 for an example of how it can be done. The problem with the space and special characters, however, still exists.

Listing 5

Grouping with Parentheses

01 $ file $(find $(find . -name Podcasts -type d) -name '*.ogg')
02 ./lang/Podcasts/dh-20130204-ausgabe-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
03 ./lang/Podcasts/dh-20130107-ausgabe-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
04 ./kurz/Podcasts/dh-20121015-kurz-018.ogg:    Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Xargs to the Rescue!

With Xargs, redirecting output as input to another command is much easier because the second program processes the input directly (see Listing 6). Xargs reads the find output as parameter input to the file command, again dividing up the input string delimited by space characters and line breaks (Listing 7).

Listing 6

Using Xargs for Redirection

01 $ find Podcasts -name '*.ogg' | xargs file
02 Podcasts/dh-20130204-ver-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
03 Podcasts/dh-20130107-ver-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Listing 7

Xargs Bombs

01 $ find Music -name '*.ogg' | xargs file
02 xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
03 Music/Roxette: ERROR: cannot open `Music/Roxette' (No such file or directory)
04 /dev/stdin:             empty
05 June:          ERROR: cannot open `June' (No such file or directory)
06 Afternoon.ogg: ERROR: cannot open `Afternoon.ogg' (No such file or directory)
07 Music/Roxette: ERROR: cannot open `Music/Roxette' (No such file or directory)
08 /dev/stdin:             empty
09 I:             ERROR: cannot open `I' (No such file or directory)

At first glance, nothing has been solved; however, the advantage of Xargs lies in the fact that the -d command-line option can determine the delimiting character. In Listing 7, the filenames were separated by line breaks. These then become delimiters in Listing 8.

Listing 8

Delimiting with Line Breaks

01 $ find Music -name '*.ogg' | xargs -d '\n' file
02 Music/Roxette - June Afternoon.ogg:           Ogg data, Vorbis audio, [...]
03 Music/Roxette - I Don't Want to Get Hurt.ogg: Ogg data, Vorbis audio, [...]
04 Music/Roxette - Joyride.ogg:                  Ogg data, Vorbis audio, [...]
05 Music/Roxette - Crash! Boom! Bang!.ogg:       Ogg data, Vorbis audio, [...]
06 Music/Roxette - Almost Unreal.ogg:            Ogg data, Vorbis audio, [...]

If the filename includes line breaks, use one of the two characters that can never occur in a filename: the null character with ASCII code 0. Because this is such a typical case, the developers of both find and xargs reserved this option for that very purpose (Listing 9).

Listing 9

Using Null as a Delimiter

01 $ find . -type f -print0 | xargs -0 file
02 ./file
03 with
04 linebreaks.txt: ASCII text

The -print0 option tells find to end each filename with a null character. With -0 Xargs delimits input with just that. In each case, the null character is used. See the "Xargs Versions" box for additional information.

Xargs Versions

Xargs is in almost all Linux and BSD distributions, and even Mac OS X has Xargs on board. The Linux implementation is invariably based on the GNU project. The program belongs to the GNU Find Utilities [1], along with find and locate .

However, not all implementations are identical. FreeBSD [2] and NetBSD [3] (which also applies to Mac OS X [4]), along with those from GNU, implement the -0 option, despite it not being part of the Posix standard [5]. The -d option, on the other hand, is found only in the GNU project implementation [6].

Nesting without Nests

If you want to nest your searches, with Xargs you create several redirections. However, you can't simply append the directories to search at the end. Instead, use the -I option to define a text string that you want to be replaced by the filename. In the example, I chose the string HERE (Listing 10).

Listing 10

Nesting with Xargs

01 $ find . -name Podcasts -type d | xargs -I HERE find HERE -name '*.ogg' | xargs file
02 ./long/Podcasts/dh-20130204-ver-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
03 ./long/Podcasts/dh-20130107-ver-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
04 ./short/Podcasts/dh-20121015-sh-018.ogg:    Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

This approach is useful when you want to access the mv or cp commands that like to have two parameters, with the last one being the target directory.

Linux – prior to version 2.6.23 (October, 2007) – and some other UNIX systems have limits to the number of command-line parameters that can be passed to another program. With backticks or the $() construct, and if the inside command outputs more than 1024 words, the shell breaks down. Here again, Xargs comes to the rescue. It knows of this limit and separates overlarge parameter lists into segments acceptable by the operating system.

Separate Treatment

Many command-line programs process only one file per call, among them GnuPG. Xargs provides the -n option to indicate how many parameters to accept per call (Listing 11). If you also add the -P with a number value, the software creates a corresponding number of instances for passing the parameter.

Listing 11

Limiting the Number of Parameters

01 $ find . -name '*.gpg' | xargs -n 1 gpg
02 [...]

Do Nothing

Sometimes doing nothing is required. Many programs don't appreciate calls without parameters and will report an error. You can make GNU Xargs avoid calling the program if there are parameters (Listing 12). The BSD implementation already does this by default.

Listing 12

Skipping Calls with No Parameters

01 $ find . -name '*.bla'
02 $ find . -name '*.bla' | xargs file
03 Usage: file [-bchikLlNnprsvz0] [--apple] [--mime-encoding] [--mime-type]
04             [-e testname] [-F separator] [-f namefile] [-m magicfiles] file ...
05        file -C [-m magicfiles]
06        file [--help]
07 $ find . -name '*.bla' | xargs -r file

Teamwork

Find is not the only command that works well with Xargs. Xargs is also very useful when combined with grep . With the -l option, the program displays only the filenames that include a match. In Listing 13, for example, the software first searches in the current directory's files scanning for the < symbol. If finds it, it passes the name of the file on to Xargs. Xargs then passes this to file which determines the file type.

Listing 13

Xargs and Grep

01 $ grep -l '<' * | xargs file
02 bar.html: HTML document, ASCII text
03 foo.xml:  XML document text

GNU grep from version 2.4 even recognizes the -Z option (for zero) that ends a filename with a null character instead of a line break, with the advantages mentioned above (Listing 14).

Listing 14

Grep with the -Z Option

01 $ grep -l foo * | xargs file
02 File:               ERROR: cannot open `File' (No such file or directory)
03 with:                 ERROR: cannot open `with' (No such file or directory)
04 linebreaks.txt: ERROR: cannot open `linebreaks.txt' (No such file or directory)
05 bar.html:            HTML document, ASCII text
06 foo.xml:             XML document text
07 $ grep -lZ foo * | xargs -0 file
08 File
09 with
10 linebreaks.txt: ASCII text
11 bar.html:                      HTML document, ASCII text
12 foo.xml:                       XML document text

The Prips [7] program also works wonders together with Xargs. The name stands for "Print IPs" and the tool prints all the IP addresses for a given range of addresses.

Listing 15 shows an example in which I used the -n 1 option one more time. Here, the host command also processes only one IP address per call.

Listing 15

Prips Processed Output

01 $ prips 192.33.96.0/30
02 192.33.96.0
03 192.33.96.1
04 192.33.96.2
05 192.33.96.3
06 $ prips 192.33.96.0/30 | xargs -n 1 host
07 0.96.33.192.in-addr.arpa domain name pointer phys-hpx-dock-1.ethz.ch.
08 1.96.33.192.in-addr.arpa domain name pointer rou-hpx-1-phys-hpx-dock-1.ethz.ch.
09 2.96.33.192.in-addr.arpa domain name pointer floo.ethz.ch.
10 3.96.33.192.in-addr.arpa domain name pointer aragog.ethz.ch.

Conclusion

Backticks on the command line and special characters in filenames hardly ever get along amiably. Additionally, using backticks can cause confusion very quickly what with nested constructs and so on. Fortunately, Xargs helps in almost all cases where backticks fail.

Acknowledgments

The author thanks Frank Hofmann and Benjamin Schieder for advice and comments when writing of this article.

The Author

Axel Beckert (http://noone.org/abe/) studied biology with a minor in computer science. He is currently working as a Linux system administrator at the Swiss Federal Institute of Technology in Zurich, is on the board of the Linux User Group Switzerland, and participates in the Debian project. He also maintains the http://planet-commandline.org website.