Shell Practice: Introduction to the sed stream editor

Slashdot it! Delicious Share on Facebook Tweet! Digg!

Searching

You can use the search function among other things to replace text sections. The search query represents the addressing. You can also use regex for search patterns. Table 5 show some of the possibilities, and Table 6 provides some examples. In this table, sed is used both in the data stream and the direct access to the text file. With composite addressing (two or more patterns), the statement is applied to all lines (except the first) matching the first address, up to and including the next line matching the second address.

Table 5

Patterns and Addressing

Action Pattern
All lines (null)
Line 25 25
Not line 25 25!
Lines 10 through 20 10,20
Last line $
Not pattern '/PATTERN/!'
Character at beginning of line ^CHAR
String /STRING/
Character set [CHARS]
Any character [:alpha:]
Lowercase [:lower:]
Uppercase [:upper:]
Alphanumeric [:alnum:]
Digit [:digit:]
Hexadecimal digit [:xdigit:]
Tab and space [:blank:]
Space [:space:]
Control character [:cntrl:]
Printable characters (no control characters) [:print:]
Visible characters (without spaces) [:graph:]
Punctuation [:punct:]

Table 6

Sample Searches and Patterns

Search for Pattern Example Figure
Term, Name '/TERM/' cat textdata.txt | sed -n '/Meier/p' -
All lines containing "man" or "Man" '/[Mm]an/p' sed -n '/[Mm]an/p' textdata.txt Figure 2
All lines except 3 through 5 '3,5!' sed -n '3,5!'p textdata.txt Figure 3
All lines except those containing "Man" '/Man/!' sed -n '/Man/!'p textdata.txt Figure 4
Lines containing "H" or "G" '/[H|G]/' sed -n '/[H|G]/'p textdata.txt -
Lines not containing "H" or "G" '/[H]\|[G]/!' sed -n '/[H]\|[G]/!'p textdata.txt Figure 5
Line 3 3 cat textdata.txt | sed -n '3p' -
Last line '$p' cat textdata.txt | sed -n '$p' -
Multiple patterns: Do not output lines containing an "R" somewhere followed by an "M" somewhere else '/[R]./,/[M]./!' sed -n '/[R]./,/[M]./!'p textdata.txt Figure 6
All lines containing some alphanumeric characters (not all space characters) '/[:alnum:]/' cat textdata.txt | sed -n '/[:alnum:]/'p Figure 7
Figure 2: Searching for Man, jackman, etc.
Figure 3: Output of lines except 3 through 5.
Figure 4: Output of lines except those containing "Man".
Figure 5: Output of lines except those containing "H" or "G."
Figure 6: Output lines except those containing an "R" somewhere followed by an "M" somewhere else.
Figure 7: Output all lines that contain alphanumeric characters (no empty lines or lines containing spaces, tabs, etc.).

Note that in Figure 6, the j comes before J in the text file. In the first example in Table 6, none of the lines containing H and J are output, which works because the order in the command and text file are the same. The second example with the negated H and j shows, however, that a line containing H must first be found. That's why johann still appears in the output!

If you want be certain in a clear way that sed is doing what you need it to do, you can combine several calls in the pipe. The following command suppresses empty lines and "Man" (see Figure 8):

cat textdata.txt | sed -n '/[:alnum:]/'p | sed -n '/Man/!'p
Figure 8: Processing several instances of sed using a pipe.

Substituting and Removing

You use the s instruction to replace matched expressions. The length of search and replace strings is irrelevant. You can see the detailed syntax shown in Figure 9.

Figure 9: Syntax of the search and replace statement.

You can limit the search and replace statement to specific lines by preceding command with the line number as shown in the following example:

sed -n '5s/OLD/NEW/p' [TEXTFILE]

Or, for a range of lines:

sed -n '1,4/OLD/NEW/p' [TEXTFILE]

You can also suppress changes to certain lines using the exclamation point:

sed -n '20-80!s/OLD/NEW/p' [TEXTFILE]

Furthermore, you can limit changes to lines which contain certain strings or patterns that are not the same as the search and replace statement:

sed -n '/[STRING|PATTERN]/s/OLD/NEW/gp' [TEXTFILE]

You can delete the matched string with an empty string.

The first occurrence of the search string on a line is processed. To replace all instances, add the g (greedy) option at the end of the statement. The stream editor can be a silent partner if the -n option is set. So, if you want to see what's going on, add the p (print) option. You can also write results to an output file with the w (write) option. Table 7 shows some short examples.

Table 7

Sample Search and Replace Statements

Action Example Figure
Replace pattern at the first occurrence only cat textdata.txt | sed -n 's/e/E/p' Figure 10
Replace pattern at every occurrence cat textdata.txt | sed -n 's/e/E/gp' Figure 10
Delete the word "Man" sed -n 's/Man//gp' textdata.txt Figure 11
Replace "Iron" with "Tin" on line 4 cat textdata.txt | sed -n '4s/Iron/Tin/gp' Figure 12
Replace "0" with "089" on all lines containing "Man" or "man" sed -n '/[Mm]an/s/0/089/gp' textdata.txt Figure 13
Replace "0" with "089" on all lines except those containing "Man" or "man" sed -n '/[Mm]an/!s/0/089/gp' textdata.txt Figure 14
Delete all numbers and backslashes (/ ) and hyphens (- ) cat textdata.txt | sed -n s'/[0-9\/-]//'gp Figure 15
Figure 10: Using the "greedy" (g) option.
Figure 11: Deleting the word "Man".
Figure 12: Limiting the search and replace to one line.
Figure 13: Limiting the search and replace to selected lines.
Figure 14: Excluding lines for the search and replace statement.
Figure 15: Deleting numbers and symbols from lines.

You can see a more complex example in Listing 1. It converts the inconsistently formatted date syntax in testlist.txt to a common, unified, albeit European (DD/MM/YYYY ) format. Be sure to press the Enter key immediately after the \ at the line's end. Alternatively, you can omit the sign and use the pipe character to connect with the line that follows it; however, this results in a less clear screen display.

The list is read in line 1 and starts the pipes in line 2. Line 2 takes any partially present leading space characters and substitutes the number 0 . Line 3 replaces any minus signs in dates with spaces. Lines 4 and 5 substitute any month written as a word with its numeric values followed by a dot. Line 6 substitutes any two-digit numbers at the beginning of a line (^ ), with the first being 1 through 3 and the second with any digit, and any space character with "itself" (& ) followed by a dot.

To make the search pattern repeatable during the replacement, enclose it in parentheses – which you have to be sure to escape with \ . The sed statement in line 7 deletes all existing space characters (through the g option for s ).

The uniq command on the last line ensures that all duplicate lines are uniquely output. Figure 16 shows the results. You can also "carry over" all or part of the original string into the replacement patterns in the replacement statement. Check out the following example:

echo "happy" | sed -n s'/happy/un&/'p

This example replaces happy with unhappy . You can also convert from lowercase to uppercase:

cat textdata.txt | sed -n s'/\([[:lower:]]\)/\U&/'pg

The \U before the & indicates the output must be converted into uppercase. You can do the following:

cat textdata.txt | sed -n s'/\([[:upper:]]/\L&/'pg

to convert from uppercase to lowercase.

Figure 16: Formatting dates.

Buy this article as PDF

Express-Checkout as PDF

Pages: 6

Price $0.99
(incl. VAT)

Buy Ubuntu User

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content