Writing clearer Bash scripts

Natalia Klenova, 123RF

Natalia Klenova, 123RF

Right Ingredients

Does the ingenious script your wrote last month look like confusing spaghetti code today? We show some proven style guidelines that can provide order and readability.

Bash scripts help rename files, sort a large address list, or execute other similarly tedious tasks. Scripts created under pressure can often look garbled (see Listing 1 for a prime example). Quick hacks may have their place, but they are often confusing and might be totally unclear a week later. For this reason, many users don't share their scripts, feeling they are not up to par. All this contributes to making such scripts hard to expand or change down the road.

Listing 1

Unformatted Script

#!/bin/sh
a="100"; b="3"; c="."; d="shot"; for i in `seq 1 $a`; \
  do import -window root $c/$d$i.png; sleep $b; done; exit

Regulator

For these reasons it's best to structure, format and, above all, comment your Bash scripts from the outset. Some guidelines can be helpful, such as the one from Google [1] or the handy PDF booklet written by Fritz Mehner [2] that you can print and keep by you computer (Figures 1 and 2). Whether you choose a commercial style guide or one from academia, the basic guidelines are the same.

Figure 1: You get details on individual guidelines at Google by clicking the triangle in front of a guideline.
Figure 2: Fritz Mehner's booklet is handy, 13-page manual you can keep by your computer for reference.

Your script should always identify the shell it is using at the beginning. If the script uses Bash functions, include #!/bin/bash on the first line. That way, systems that don't use Bash as the default shell, but have it installed, can still use it.

Fresh Air

To make the script more readable, get used to using the proper indentation and including blank lines. The confusing elements in Listing 1 become less cryptic if presented as in Listing 2.

Listing 2

Script with Proper Indentation

#!/bin/bash
a="100"
b="3"
c="."
d="shot"
for i in `seq 1 $a`
do
  import -window root $c/$d$i.png;
  sleep $b
done;
exit

Each line has a separate instruction and a blank line separates blocks of related code, such as the block for the for-do loop. The statements in a block should be indented, such as the do portion of the loop. If you use space characters or a tab for indentation, different editors might interpret them differently. Style guides almost unanimously suggest using a double space character in place of a tab.

Notice that the do command is by itself on a line. Google, however, suggests including it on the previous for line. The same goes for while loops and the then in an if block. The else, however, is often best placed on its separate line, as is the ending done. As for if statements, Google recommends replacing test with double square brackets as abbreviations:

if [[ ${filename} == "image*" ]]; then ...

This notation has the advantage of allowing for the use of regular expressions. It also reduces errors in that the double square brackets neither arbitrarily truncate words nor expand filenames.

In the Break

In a case statement, list the alternatives indented by two spaces. The patterns and their values should each be on their own line. Listing 3 shows an example. Link multiple commands when necessary with a pipe, such as ls -la | grep TODO.

Listing 3

Sample case Statement

case "${auto}" in
  porsche)
    motor="hum"
    echo "${motor}"
    ;;
  *)
    echo "No known car"
    ;;
esac

The pipe symbol should have a space characters on both sides. If the chain consists of more than two elements, it's best to enter each element on a separate line, with successive lines indented as shown below:

<command1> | <command2> | <command3>

Long parameter lists should follow the same principle:

./configure --prefix=$HOME --enable-audio --enable-video

Furthermore, Google recommends wrapping lines at 80 characters, although other style guides allow for a slightly longer line length. These limits allow for easier printing and fewer scrolling hassles (Figure 3).

Figure 3: The terminal wraps lines at 80 characters. Many editors (e.g., nano in the background) truncate the line when the right edge is reached.

Storage Locations

If you name your variables with short, cryptic names, like a, what they represent can be deduced only in context, and then only with great difficulty.

Therefore, you should choose meaningful variable names that describe their contents. A variable named filename certainly can't be misinterpreted down the road. The names should be short and concise, consisting of no more than 31 characters. Take the pain of choosing suitable variable names, and it will pay off in the long run.

Most style guides suggest using lowercase variable names ( count) and uppercase constants ( NUMBER) . The constants should be defined as a group at the beginning of the script. Avoid using names that are reserved words for environmental variables.

You can find a list of these words on the web [3]. To be on the safe side, give constants a prefix, such as ABC_COUNT. If the name consists of multiple words, separate them with an underscore character, such as letter_greeting.

You're also advised to use singulars and plurals per context, which helps especially in loops:

for amount in ${prices}; do
  echo ${amount}
done

Always enclose variables in curly brackets, such as ${last_name}. Enclose strings that include variables in quotes:

name="Mr. ${last_name}"

In that way, Bash correctly replaces the variable. With the new replacement names, the script in Listing 1 begins to gain clarity. Listing 4 shows a group of images that the script will process in certain ways.

Listing 4

Images for Processing

#!/bin/bash
NUMBER_IMAGES=100
WAITING_PERIOD_SECONDS=3
LOCATION="."
PREFIX_FILENAME="shot"
for i in `seq 1 ${NUMBER_IMAGES}`; do
  import -window root "${LOCATION}/${PREFIX_FILENAME}${i}.png";
  sleep ${WAITING_PERIOD_SECONDS}
done;
exit

Numbers appear in quotes only if they're part of a string. To ensure that false values don't enter calculations, always initialize variables. If the content of a variable shouldn't change, mark it explicitly with the keyword readonly :

result = do_something
readonly result

If a subsequent process changes the variable, Bash returns an error, which saves you a lengthy error search.

Nicely Embedded

Functions take names in the same way as variables. Take care not to select names that are system functions or commands – especially avoid test. Developers are forever arguing over where to put the curly brackets that enclose functions. Google suggests putting the opening curly bracket at the end of the function name line and the end bracket on a separate line at the end:

function my_function() {
<... commands ...>
}

In this example, the keyword function helps quickly identify the statement as such. However, this might complicate porting, so most style guides advise against using the function keyword. Identify local variables inside a function with local . This step helps avoid flooding the global namespace or possibly changing the wrong variable elsewhere:

calculate() {
  local amount=1
...

Get into the habit of checking the return value of a function. If a function returns nothing, you should at least confirm a successful end or an error. You must test all passed parameters. False or omitted parameters from users can lead to undesired results. If arguments are passed with $@ or $* , Google recommends always giving $@ precedence. Whereas $* consolidates all arguments into one with a string value, $@ leaves the arguments as they are [3].

Collect all function definitions at the beginning of the script, ideally just after the constants. To quickly find the beginning of a long script, wrap the main program in a function called main. Google's style guide recommends this for all scripts with at least one function defined. The very end of the script would then have the statement main "$@".

Going Outside

Listing 4 uses so-called backticks (left-facing single quotes) to embed a command. This syntax is error prone and confusing, especially with embedded commands. Thus, you should replace the backticks with the more current notation $(<command>) . In Listing 4, the corresponding line would be replaced with the following:

for i in $(seq 1 ${NUMBER_IMAGES}); do

Always use full path names when processing a file, or your files might end up in the wrong directories. Using rm and other means to delete files also runs the risk of destroying needed directories.

Google gives as examples the dreaded rm -v * , which deletes all subdirectories with it, and rm -v ./* , which deletes all files on the hard drive. Also stay away from ls and its output. The returned filenames might contain line breaks, and different systems have different output conventions for this command. Also stay away from eval , which might give you loss of control over the code that the shell executes. Additionally, you can't tell whether the command was successful or how to change it if an error occurred.

Give the built-in Bash commands precedence and don't concatenate strings, for example, with external programs. For one thing, you don't know if the program even exists on other computers. For another, updates to the program can modify its behavior. Because Bash doesn't need to start a new process, running the script with the internal commands is much faster – especially if loops come into play.

If you're calling external programs in the script, use options in their long form, such as grep --version instead of grep -V . In this version, the meaning of the option becomes clearer in the longer form. If your script provides options, they should be available in long and short forms. Suggestions for developing parameters are included in the Free Software Foundation's GNU Coding Standards [5], and the most commonly used options are described at the Linux Documentation Project [6].

Picket Fences

Even if you're under time pressure, comment your scripts. After a week, you will still know what the script is doing and why. At the beginning of the script, briefly identify its purpose:

#!/bin/bash
#
# Captures screenshots

Some style guides recommend adding more, such as the author and version number. The comment hash mark (# ) should be followed on the line with a space character before adding the comment text so that the comments are easier to locate.

Each function should have its own comments that briefly identify its purpose and methods, all passed arguments and returned values, and possibly which global values the function changes. Google recommends a structure like that shown in Listing 5.

Listing 5

Recommended Structure

#######################################
# Captures screenshots
# Globals:
#   LOCATION
#   FILENAME
# Arguments:
#   None
# Returns:
#   None
#######################################
screenshot() {
  ...
}

If you're tempted to write a short novel as comments, your function is probably overly complex. You might want to think about splitting the function into smaller ones.

Finally, you should include comments for code blocks that might not be self-explanatory. Good candidates are loops and if queries. Comments don't need to be too detailed, and you certainly don't need to comment each line. Some style guides suggest adding comments after the command line itself instead of on separate lines:

NUMBER=100 # False
# True
NUMBER=100

The comments should never just echo the code but simply describe its purpose. Instead of for 1 through 100 calls image and waits 3 seconds , write captures a screenshot every 3 seconds . Providing a good comment is often as hard as giving variables appropriate names.

If you've just quickly written the code with the intention of revising it soon after, you can add comments to target your further revision with a TODO, such as:

# TODO (tim@example.com)
# Calculate pseudo-random number

In the parentheses give the contact person's name, ideally as an email address. Whoever sees the script later can then contact you. Some text editors even highlight lines beginning with #TODO so that they stand out. Apart from TODO , some style guides recommend adding FIXME for buggy sections or XXX for places needing particular attention.

Launch Pad

Remember to add the necessary rights to execute the script:

$ chmod +x <Script>.sh

Google advises omitting the .sh extension. None of the linked script files should have the execution bits set or retain the .sh in filenames. Filenames should all be lowercase. If you are concatenating words in filenames, separate them with underscores and not hyphens, such as take_screenshots . Remember not grant the script any SUID or SGID privileges to avoid risk of security breaches or misuse.

The script should provide users with termination information as to success or interrupted processing. In the latter case, it should provide a meaningful message on the STDERR channel. This approach allows all messages to be targetedly redirected later. End the script on error with exit 1 , otherwise, with exit 0 . That way the script can be more easily integrated into other projects.

Listing 6 shows the cryptic script in Listing 1 in its revised and improved form. If comments are in a foreign language, it's best to translate them into English. By the way, you can reduce the code in Listing 6 onto one line with a current version of ImageMagick:

Listing 6

Revised and Improved Script

#!/bin/bash
#
# Captures screenshots
NUMBER_IMAGES=100
WAITING_PERIOD_SECONDS=3
LOCATION="."
PREFIX_FILENAME="shot"
# captures a screenshot every 3 seconds
# but not more than specified
for i in $(seq 1 ${NUMBER_IMAGES}); do
  import -window root "${LOCATION}/${PREFIX_FILENAME}${i}.png";
  sleep ${WAITING_PERIOD_SECONDS}
done;
exit
$ import -window root -snaps 100 -pause 3 shot.png

As you can see, referencing documentation or a man page can often save you from writing a script to begin with.

Conclusion

Some guidelines presented here may seem unnecessary. If you use them, however, you have a greater chance of writing clear Bash scripts that can easily be understood even at a much later date.

If you follow the guidelines, other programmers can work into your scripts more easily. For the sake of teamwork, agreeing on common scripting guidelines is a good idea. Additionally, adhering to these guidelines can reduce typos and errors. The standard you use is entirely up to you, as long as you stick with it.