Shell Practice: Introduction to the sed stream editor

Dmitriy Sladkov, 123RF.com

Dmitriy Sladkov, 123RF.com

Quick Edit

With sed, you can edit text data without an interactive user interface, using pipes or input redirection. Sed lets you execute extensive editing commands on a single line.

The sed stream editor [1] automates many repetitive operations, especially effectively inside a shell script. You can use regular expressions (regex) to provide a "nature" of the string. In this article, I'll start with the program screen output. If you want to participate and practice, simply type the text files in an editor of your choice.

Sed Commands

The program calls up and accepts commands from virtually anywhere. You can pass commands directly or read them in from a separate file. The data can be piped, redirected, or input from a text file. The output can be sent to the screen (usually stdout ), through a pipe to the next command, or redirected to a destination file. With sed itself, no files are ever overwritten! The "Sed Call Options" box has the details. To resolve shell variables, you sometimes need to substitute the ' character with the " character.

Sed Call Options

Sed simply reads the text file and returns the results through stdout:

sed [COMMAND] [TEXTFILE]

Same with input redirection:

sed [COMMAND] < [TEXTFILE]

Inclusion of sed in one or more pipes:

[PROGRAM1] | sed [COMMAND] | ......

Commands stored in a separate file and read in:

... sed -f [SCRIPT] .....

Output of sed redirected to a text file, but omitting error messages:

... sed [COMMAND] > [TARGETFILE]

The same, but including error messages into the target file:

... sed [COMMAND] > [TARGETFILE] 2>&1

Syntax

The basic syntax structure is shown in Figure 1. You'll notice for all locations where the editing commands should be used that addressing is required. You can provide many addresses as long as doing so doesn't affect clarity. If you want to change "everything except," you can negate the addressing with the ! character.

Figure 1: Sed syntax structure.

You can add multiple commands on one command line to sed as follows:

sed -e 'command1' -e 'command2' ... -e 'commandN' ....

Or, you can add these commands in a script file.

Script Files sed

The script file should have a single line for each statement. For example:

s/Gans//
s/jo/Jo/g

The instructions remove the word "Gans" – rather, it substitutes "nothing" for the word, but only for the first instance of the search string. The second lines substitutes "Jo" for "jo" for all instances because of the g option.

To create an executable sed script, include the shebang (#! ) interpreter statement on the first line:

#!/bin/sed -f

If you make the script executable (e.g., chmod 700 [SCRIPTNAME] ), you can call it like any other program. You wouldn't normally use this option. Rather, you would put sed and any script file calls in a shell script. In some cases, the order of the commands matters. Test your scripts before making them "real" to avoid errors and data loss.

Sample Data

You can use the textdata.txt file in Listing 1 to exercise your sed skills. This file looks as though it has been thrown together and contains empty lines, typos, and other errors. The second sample file I'll use in this article is called testlist.txt (Listing 2) and has dates formatted in different ways as content.

Listing 1

textdata.txt

chris hemsworth - Thor 0885465468798746
Scarlett Johansson - Black Widow 08755466584
Robert Downey - Iron Man 0987654321
Mark Ruffalo - Hulk 0405458765143321
Chris Evans - Captain America 0548/9988776655
Jeremy renner - Hawkeye 555/8812470
Tom Hiddleston  - Loki 87841487014848
Samuel Jackson - Nick Fury 043/956026386
Cobie Smulders - Maria Hill 23514560145
Hugh jackman - Wolverine 801539193Paul Rudd - Ant Man 497349000

Listing 2

testlist.txt

22 April 1984
 7.04.1985
30 March 1986
19 April 1987
03.04.1988
26 March 1989
15 April 1990
31-March-1991
19 April 1992
11 April 1993
 3 April 1994
16. April 1995
 7 April 1996
30 March 1997
12 April 1998

Regular Expressions

Some regular expressions are used in sed. You can use them to describe many string patterns. The more of them you use in sed, the more confusing it can get with more complex statements. Here, sed scripts can often help. Some characters are valid as special shell characters as well as regex instructions. These you need to "escape" with the \ character (Table 1). The construct [ABC] means "contains A or B or C" and the construct /ABC/ means "contains exactly that string."

Table 1

Special Characters

Character Function
( Opens statement
) Ends statement
{ Opens optional statement
} Closes optional statement
[ Opens a list of characters
] Closes a list of characters
" Masks a statement in which shell variables are resolved
' Masks a statement in which shell variables not resolved
` Encloses a statement block
. Any character other than a newline
, Separates parameters, such as line items
: Sets labels (t and b command)
$ End of document, end of line or last line
& Placeholder for search patterns, included in the replacement statement
| Or (regex separator)
/ Separator in editing commands
^ Beginning of line, or negation in a search pattern
\ Escape character
! After a line number: do not output this line
* 0 Or any number of times
+ Pattern present at least once
= Output line number
\n Newline, line feed
\t Tab character

Options and Editing Commands

Confusingly, both sed and the editing commands have their own options. As is usual in Linux, these options are preceded with the - character. The editing statements take their options after them. Tables 2-4 provide an overview.

Table 2

Sed Options

Action Option
Execute command (can usually be omitted) -e
Disable data buffering -u
Treat files separately -s
Use extended regexes -r
Create backup file -i [FILEEXTENSION]
Read and execute script file -f [SCRIPTFILE]
Suppress (unaffected) text areas -n
Show version -v

Table 3

Editing Commands

Action Command
Add lines above this one i
Add lines below this one a
Output this line p
Output this line with a maximum length l [LENGTH]
Replace signs with others y
End sed q
Replace text in this line c
Delete this line d
Search and replace s

Table 4

Editing Command Options

Action Option
Output line number =
All occurrences g
Outputs modified line with the s editing command p
Write the edited line in the file w

Searching

You can use the search function among other things to replace text sections. The search query represents the addressing. You can also use regex for search patterns. Table 5 show some of the possibilities, and Table 6 provides some examples. In this table, sed is used both in the data stream and the direct access to the text file. With composite addressing (two or more patterns), the statement is applied to all lines (except the first) matching the first address, up to and including the next line matching the second address.

Table 5

Patterns and Addressing

Action Pattern
All lines (null)
Line 25 25
Not line 25 25!
Lines 10 through 20 10,20
Last line $
Not pattern '/PATTERN/!'
Character at beginning of line ^CHAR
String /STRING/
Character set [CHARS]
Any character [:alpha:]
Lowercase [:lower:]
Uppercase [:upper:]
Alphanumeric [:alnum:]
Digit [:digit:]
Hexadecimal digit [:xdigit:]
Tab and space [:blank:]
Space [:space:]
Control character [:cntrl:]
Printable characters (no control characters) [:print:]
Visible characters (without spaces) [:graph:]
Punctuation [:punct:]

Table 6

Sample Searches and Patterns

Search for Pattern Example Figure
Term, Name '/TERM/' cat textdata.txt | sed -n '/Meier/p' -
All lines containing "man" or "Man" '/[Mm]an/p' sed -n '/[Mm]an/p' textdata.txt Figure 2
All lines except 3 through 5 '3,5!' sed -n '3,5!'p textdata.txt Figure 3
All lines except those containing "Man" '/Man/!' sed -n '/Man/!'p textdata.txt Figure 4
Lines containing "H" or "G" '/[H|G]/' sed -n '/[H|G]/'p textdata.txt -
Lines not containing "H" or "G" '/[H]\|[G]/!' sed -n '/[H]\|[G]/!'p textdata.txt Figure 5
Line 3 3 cat textdata.txt | sed -n '3p' -
Last line '$p' cat textdata.txt | sed -n '$p' -
Multiple patterns: Do not output lines containing an "R" somewhere followed by an "M" somewhere else '/[R]./,/[M]./!' sed -n '/[R]./,/[M]./!'p textdata.txt Figure 6
All lines containing some alphanumeric characters (not all space characters) '/[:alnum:]/' cat textdata.txt | sed -n '/[:alnum:]/'p Figure 7
Figure 2: Searching for Man, jackman, etc.
Figure 3: Output of lines except 3 through 5.
Figure 4: Output of lines except those containing "Man".
Figure 5: Output of lines except those containing "H" or "G."
Figure 6: Output lines except those containing an "R" somewhere followed by an "M" somewhere else.
Figure 7: Output all lines that contain alphanumeric characters (no empty lines or lines containing spaces, tabs, etc.).

Note that in Figure 6, the j comes before J in the text file. In the first example in Table 6, none of the lines containing H and J are output, which works because the order in the command and text file are the same. The second example with the negated H and j shows, however, that a line containing H must first be found. That's why johann still appears in the output!

If you want be certain in a clear way that sed is doing what you need it to do, you can combine several calls in the pipe. The following command suppresses empty lines and "Man" (see Figure 8):

cat textdata.txt | sed -n '/[:alnum:]/'p | sed -n '/Man/!'p
Figure 8: Processing several instances of sed using a pipe.

Substituting and Removing

You use the s instruction to replace matched expressions. The length of search and replace strings is irrelevant. You can see the detailed syntax shown in Figure 9.

Figure 9: Syntax of the search and replace statement.

You can limit the search and replace statement to specific lines by preceding command with the line number as shown in the following example:

sed -n '5s/OLD/NEW/p' [TEXTFILE]

Or, for a range of lines:

sed -n '1,4/OLD/NEW/p' [TEXTFILE]

You can also suppress changes to certain lines using the exclamation point:

sed -n '20-80!s/OLD/NEW/p' [TEXTFILE]

Furthermore, you can limit changes to lines which contain certain strings or patterns that are not the same as the search and replace statement:

sed -n '/[STRING|PATTERN]/s/OLD/NEW/gp' [TEXTFILE]

You can delete the matched string with an empty string.

The first occurrence of the search string on a line is processed. To replace all instances, add the g (greedy) option at the end of the statement. The stream editor can be a silent partner if the -n option is set. So, if you want to see what's going on, add the p (print) option. You can also write results to an output file with the w (write) option. Table 7 shows some short examples.

Table 7

Sample Search and Replace Statements

Action Example Figure
Replace pattern at the first occurrence only cat textdata.txt | sed -n 's/e/E/p' Figure 10
Replace pattern at every occurrence cat textdata.txt | sed -n 's/e/E/gp' Figure 10
Delete the word "Man" sed -n 's/Man//gp' textdata.txt Figure 11
Replace "Iron" with "Tin" on line 4 cat textdata.txt | sed -n '4s/Iron/Tin/gp' Figure 12
Replace "0" with "089" on all lines containing "Man" or "man" sed -n '/[Mm]an/s/0/089/gp' textdata.txt Figure 13
Replace "0" with "089" on all lines except those containing "Man" or "man" sed -n '/[Mm]an/!s/0/089/gp' textdata.txt Figure 14
Delete all numbers and backslashes (/ ) and hyphens (- ) cat textdata.txt | sed -n s'/[0-9\/-]//'gp Figure 15
Figure 10: Using the "greedy" (g) option.
Figure 11: Deleting the word "Man".
Figure 12: Limiting the search and replace to one line.
Figure 13: Limiting the search and replace to selected lines.
Figure 14: Excluding lines for the search and replace statement.
Figure 15: Deleting numbers and symbols from lines.

You can see a more complex example in Listing 1. It converts the inconsistently formatted date syntax in testlist.txt to a common, unified, albeit European (DD/MM/YYYY ) format. Be sure to press the Enter key immediately after the \ at the line's end. Alternatively, you can omit the sign and use the pipe character to connect with the line that follows it; however, this results in a less clear screen display.

The list is read in line 1 and starts the pipes in line 2. Line 2 takes any partially present leading space characters and substitutes the number 0 . Line 3 replaces any minus signs in dates with spaces. Lines 4 and 5 substitute any month written as a word with its numeric values followed by a dot. Line 6 substitutes any two-digit numbers at the beginning of a line (^ ), with the first being 1 through 3 and the second with any digit, and any space character with "itself" (& ) followed by a dot.

To make the search pattern repeatable during the replacement, enclose it in parentheses – which you have to be sure to escape with \ . The sed statement in line 7 deletes all existing space characters (through the g option for s ).

The uniq command on the last line ensures that all duplicate lines are uniquely output. Figure 16 shows the results. You can also "carry over" all or part of the original string into the replacement patterns in the replacement statement. Check out the following example:

echo "happy" | sed -n s'/happy/un&/'p

This example replaces happy with unhappy . You can also convert from lowercase to uppercase:

cat textdata.txt | sed -n s'/\([[:lower:]]\)/\U&/'pg

The \U before the & indicates the output must be converted into uppercase. You can do the following:

cat textdata.txt | sed -n s'/\([[:upper:]]/\L&/'pg

to convert from uppercase to lowercase.

Figure 16: Formatting dates.

Character Replacement

Use the y option for character filtering and other applications. The pattern should contain all the characters that need to be replaced, and the replacement statement should have the same number of characters. The command structure should only have s , and -n should be omitted:

sed y'/[Search CHAR]/[Replacement CHAR]/'

Substitute the first character of only the lines in textdata.txt that begin with c in lowercase characters with uppercase character C (Figure 17).

Figure 17: One-for-one character replacement.

You use c to replace entire lines:

sed 'PATTERN'c'REPLACEMENT'

You can also do it like this:

sed [LINE(n)] c'REPLACEMENT'

The example in Figure 18 deletes an empty line and replaces it with a series of dashes.

Figure 18: Replacing an entire line matched to a search pattern.

In place of a search pattern, line numbers can be used. Be aware that even if you specify multiple line numbers they will all be replaced by one single instance of the replacement string. So, if you pick three lines, it will seem like the first line gets replaces, and the second and third lines get deleted.

The example in Figure 19 deletes line 2 and substitutes a series of hash marks. The second example deletes through line 4 and replaces them with the given line.

Figure 19: Replacing whole lines per line numbers.

You also can delete lines using the d option and using a search pattern or line numbers:

sed '/PATTERN/'d
sed [LINE(n)]d

Using the commands in Figure 20, you can search and delete an empty line and then delete line 4.

Figure 20: Deleting lines.

Adding and Inserting

With a , you add lines beneath and, using i , you insert a lines above the search pattern. To state where you must insert the line, you indicate the search pattern or a line number. If you enter multiple line numbers or the pattern matches multiple times, the insertion occurs for each instance (Figure 21).

Figure 21: Adding lines.

In the second line, a new line is added above the first line in the file; whereas in the next line, it's added at the end ($ ). The command at the next prompt adds a new line above the matched search pattern, and the next line adds it below.

Shell Variables

If a shell variable needs to be resolved, you need to enclose the statements in double quotes (" ) instead of single quotes (' ). The little shell script in Listing 3 shows how to handle variables. It searches through the sample file and outputs the matching lines. Figure 22 shows the result.

Listing 3

searchString.sh

01 #! /bin/sh
02 echo -n "Enter search string: ";read sstring
03 cat textdata.txt | sed -n "/$sstring/"p
Figure 22: Handling variables in Bash scripts.

Conclusion

With sed, you can execute complex text manipulation without user intervention. Its cryptic syntax might seem cumbersome at first, which is why building scripts bit by bit is a great option.

Infos

  1. Project page for sed: http://sed.sourceforge.net/