Preparing output for further processing with Xargs

Slashdot it! Delicious Share on Facebook Tweet! Digg!
lucadp, 123RF

lucadp, 123RF

Bit by Bit

Special characters in filenames often cause problems for shell tools. With Xargs, you can circumvent these problems.

Complex commands on the command line often require applying the output as input for another command. A typical example of this is the find command, which lists recursively all the parameters of corresponding files. The command in Listing 1 finds all Ogg Vorbis files in the Podcasts folder.

Listing 1

Finding OGG Files

$ find Podcasts -name '*.ogg'
Podcasts/dh-20130204-ver-045.ogg
Podcasts/dh-20130107-ver-044.ogg

If you want to know whether all the files found are exclusively Ogg Vorbis files, you can use the file command, which uses a filename as a parameter. In the example shown, there are only two files, but if there were more, doing things by hand would be quite a lot of work. That is why you have backticks (` ).

The output of a command between backticks then becomes the parameter for another external command. Consider Listing 2, for example.

Listing 2

Finding and Identifying OGG Files

$ file `find Podcasts -name '*.ogg'`
Podcasts/dh-20130204-ver-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
Podcasts/dh-20130107-ver-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Stumbling Blocks

So far, so good. The function trims the output of the inner command of all spaces, newlines, and tabs. But, what happens when find finds a file with spaces in its name? The answer is shown in Listing 3.

Listing 3

Finding Space in a File Name

$ file `find Podcasts -name '*.ogg'`
01 $ find Music -name '*.ogg'
02 Music/Roxette - June Afternoon.ogg
03 Music/Roxette - I Don't Want to Get Hurt.ogg
04 $ file `find Music -name '*.ogg'`
05 Music/Roxette: ERROR: cannot open `Music/Roxette' (No such file or directory)
06 ^C
Podcasts/dh-20130107-ver-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Several things are happening here. The shell split the filename at the spaces into separate parameters. It even took the hyphen character by itself. The file command expects a standard input to analyze the parameters and runs into countless errors. The only way to stop them is with the Ctrl+C as indicated in the example.

An additional stumbling block emerges when nesting backticks. In practice, this occurs when you first list all the directories you want to search. Listing 4 shows a command that searches for all directories named Podcasts and includes the nested backticks.

Listing 4

Nested Backticks

01 $ file `find \`find . -name Podcasts -type d\` -name '*.ogg'`
02 ./lang/Podcasts/dh-20130204-ausgabe-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
03 ./lang/Podcasts/dh-20130107-ausgabe-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
04 ./kurz/Podcasts/dh-20121015-kurz-018.ogg:    Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Backticks allow nesting, but the nested backticks have to be escaped with backslash to prevent the shell from interpreting it as the ending element for the outer backtick. One level deeper and things get quickly out of control: Then the backslash requires a second escaping backslash.

The quick solution is to use the parentheses ($() ) construct rather than backticks because the open and close parentheses make it easier to visually group things than do undifferentiated backticks. See Listing 5 for an example of how it can be done. The problem with the space and special characters, however, still exists.

Listing 5

Grouping with Parentheses

01 $ file $(find $(find . -name Podcasts -type d) -name '*.ogg')
02 ./lang/Podcasts/dh-20130204-ausgabe-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
03 ./lang/Podcasts/dh-20130107-ausgabe-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
04 ./kurz/Podcasts/dh-20121015-kurz-018.ogg:    Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Xargs to the Rescue!

With Xargs, redirecting output as input to another command is much easier because the second program processes the input directly (see Listing 6). Xargs reads the find output as parameter input to the file command, again dividing up the input string delimited by space characters and line breaks (Listing 7).

Listing 6

Using Xargs for Redirection

01 $ find Podcasts -name '*.ogg' | xargs file
02 Podcasts/dh-20130204-ver-045.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps
03 Podcasts/dh-20130107-ver-044.ogg: Ogg data, Vorbis audio, stereo, 44100 Hz, ~91840 bps

Listing 7

Xargs Bombs

01 $ find Music -name '*.ogg' | xargs file
02 xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
03 Music/Roxette: ERROR: cannot open `Music/Roxette' (No such file or directory)
04 /dev/stdin:             empty
05 June:          ERROR: cannot open `June' (No such file or directory)
06 Afternoon.ogg: ERROR: cannot open `Afternoon.ogg' (No such file or directory)
07 Music/Roxette: ERROR: cannot open `Music/Roxette' (No such file or directory)
08 /dev/stdin:             empty
09 I:             ERROR: cannot open `I' (No such file or directory)

At first glance, nothing has been solved; however, the advantage of Xargs lies in the fact that the -d command-line option can determine the delimiting character. In Listing 7, the filenames were separated by line breaks. These then become delimiters in Listing 8.

Listing 8

Delimiting with Line Breaks

01 $ find Music -name '*.ogg' | xargs -d '\n' file
02 Music/Roxette - June Afternoon.ogg:           Ogg data, Vorbis audio, [...]
03 Music/Roxette - I Don't Want to Get Hurt.ogg: Ogg data, Vorbis audio, [...]
04 Music/Roxette - Joyride.ogg:                  Ogg data, Vorbis audio, [...]
05 Music/Roxette - Crash! Boom! Bang!.ogg:       Ogg data, Vorbis audio, [...]
06 Music/Roxette - Almost Unreal.ogg:            Ogg data, Vorbis audio, [...]

If the filename includes line breaks, use one of the two characters that can never occur in a filename: the null character with ASCII code 0. Because this is such a typical case, the developers of both find and xargs reserved this option for that very purpose (Listing 9).

Listing 9

Using Null as a Delimiter

01 $ find . -type f -print0 | xargs -0 file
02 ./file
03 with
04 linebreaks.txt: ASCII text

The -print0 option tells find to end each filename with a null character. With -0 Xargs delimits input with just that. In each case, the null character is used. See the "Xargs Versions" box for additional information.

Xargs Versions

Xargs is in almost all Linux and BSD distributions, and even Mac OS X has Xargs on board. The Linux implementation is invariably based on the GNU project. The program belongs to the GNU Find Utilities [1], along with find and locate .

However, not all implementations are identical. FreeBSD [2] and NetBSD [3] (which also applies to Mac OS X [4]), along with those from GNU, implement the -0 option, despite it not being part of the Posix standard [5]. The -d option, on the other hand, is found only in the GNU project implementation [6].

Buy this article as PDF

Express-Checkout as PDF

Pages: 4

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content