Skip to content

Instantly share code, notes, and snippets.

@illy
Created July 31, 2012 22:15
Show Gist options
  • Save illy/3221134 to your computer and use it in GitHub Desktop.
Save illy/3221134 to your computer and use it in GitHub Desktop.
sed and awk notes

##AWK notes##

  1. selective printing

     awk '$2 ~ regex, { $1="", pring $0}' 
    

If $2 = regex, then print the whole line but not $1

  1. convert a single line to multiple lines

     awk -F, '{
         		print $4 ", " $0 #print $4 & $0 
     	        }' $* |
    
    
         sort |
         awk -F, '
         $1 == LastState { # If $1 = LastState
             print "\t" $2 # print $2 with a leading tab
         $1 != LastState {
             LastState = $1 # reset the LastState
         print $1
         print "\t" $2}'
    

##Regex notes##

    can[ no']* t# match cant, can't, cannot, can not
  1. ^$, #blank line

     ^ *$ # blank line with white space
     ^.*$ # blank line with white space
    
  2. {m, n} # to specify the range in sed/grep, 0 ≤ m, n ≤ 256

##SED notes##

Basic sed substitution structure:

sed '[address]s/pattern/replacement/flag/' # the delimiter can be any character
  1. specifying the address

     sed '/regex/s/regex2/reges3/' # substitute regex2 with regex3 in the line containing regex1
    
  2. delete

     sed 1d # delete the first line
    
     sed $d # delete the last line
    
     sed '50, $d' # delete from the 50th line to the end 
    
     sed '1, /^$/d' # delete the first line to the first blank line
    
     sed '1, /^$/!d' # delete all lines except from the first line to the first blank line
    
     sed '[address]/regex/d' # delete the whole line in the position of address containing regex
    
     sed 's/^[ |TAB]* //g' # delete all leading space or tabs
    
  3. spacing

     sed '/^+  * /d' # delete the leading space
    
     sed '/  * // /g' # substitute additional spaces between words
     
     sed '/\.  * /.cc/g' # delete the spaces behind a period
    
  4. suppressing

     sed -n '/regex1,/regex2/p' # print the pattern between regex1 and regex2
    
     sed -n '
         s/regex1/regex2/p
         s/regex2/regex3/p' # this shows how sed works
    
  5. hyphen

Cf:

sed 's/--/\\(em/g' # replace all hyphens

sed '/---/!s/--\\(em/g' # replace all two-dash hyphen, but not three-dash hyphen
  1. Extracting spaces at the end of a line is not permitted.

  2. Without -g flag, sed only deals with the first occurrence.

  3. order of occurrence

     sed 's/ />/2' # replace the second space with >
    
     sed 's/ /\
     /2	# replace the second space with a new line
    
  4. &, \n, \ are meta data in the replacement.

  5. ( ) are used to specify the range.

  6. sort -u # remove all duplicate lines

  7. append, insert, change

    sed '[address]a, i, c\
    text
    
    sed	'/From, 1/,/^$/{ #
            s/From //p
            c\
        <Mail Header Removed>
        }
    
  8. list

     sed -n -e "l" FILE # -n suppress data, -e the next pattern is a command
    
  9. debugging

     sed '[address]{
         p	# print the original line
         s/regex1/regex2/p # print the modified line
         }'
    
     sed '[address][
         =	# print the line number
         p	# print the line
         }
    

#Sed Notes II##

  1. Multiple-line pattern space

     N, D, P: multiple line
     n, d, p: single line
    
             sed '
             /operator/{
             N
             s/Owner and Operator\nGUIde /Installation GuIde #\n means to insert a new line, 
             }' #but new line cannot be put in the replacement
    
             sed '
             /operator/{
             N
             s/Owner and operator\nGuide/Installation Guide\
             /
             }'
    
             sed  '
             /Owner/{
             N
             s/Owner *\n*and *\n*Operator *\n*Guide /Installation Guide # *\n* means the new line is optional
             }'
    
             sed '
             s/Owner and Operator Guide/Installation Guide/
             /Owner/{
             N
             s/ *\n/ /
             s/Owner and Operator Guide */Installation Guide\
             /
             }'
    
  2. $!N # It excludes the end of the line from N command

  3. A script for extracting figures

             sed '
             /<para>/{ # when <para> occurs
             N
             C\ # Change the multiple lines to
             .LP
             }
             /<Figure Begin>/,/<Figure End>/>{ # between <Figure Begin> and <Figure End>
     		    	w fig.interleaf # write to file fig.interleaf
             /<Figure End>/i\ # insert the following
                 .FG\
                 <insert figure here>\
                 .FE
     	        d # delete the original patterns
     	        }
             /^$/d' # delete blank lines
    
  4. multi-line deletion

    Cf.

             sed '
             /^$/{
             N
             /^\n$/d
             }' # only removes the additional blank lines in odd lines
    

    and

             sed /^\n$/D # removes all additional blank lines (It deletes at least two blank lines that occur together)
    
  5. Multi-line print

    Cf:

             sed '
             /UNIX$/{
             N # append a new input
             s/\nSystem/Operating &/ # & means the regex 1
             P # print the modified line
             D # prevent the repeating
             }
    

    and

             sed '
             /UNIX$/{
                 N
                 /\nSystem/{
                     s// Operating &/
                     P
                     D
                 }
             }'
    
  6.          sed 's/@f1(\(.*\))/\\fB\1\\fR/g'
     	
             # The original file use @f1 to bold the text, so the script use fB and fR to do so.
     	
             # \1\ matches the pattern to be kept \(.*\)
     
             sed 's/@f1(\([^)]*\))/\\fB\1\\fR/g'
                 # use [^)] to replace .*
    
             sed '
             s/@f1(\([^)]*\))/\\fB\1\\fR/g
             /@f1(.*/{
             N
             s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
             }'
     
             sed '
             s/@f1(\([^)]*\))/\\fB\1\\fR/g
             /@f1(.*/{
             N
             s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
             P
             D
             }' 
             # Once making a substitution across two lines, 
             # print the first line and then delete it.
             # With the second portion remining in the pattern space,
             # control passes on the top of the script where we see
             # if there is an @f1 remaining on the line.
    
  7. Hold that line

    h, H # Hold: copy or append contents of pattern space to hold space.

    g, G # Get: copy or append contents of hold space to pattern space.

    x # Exchange: Swap contents of hold space and pattern space.

    the lowercase commands overwrite the contents of the target buffer,

    the uppercase commands append to the existing contents.

     Trial:
    
     1
     2
     11
     22
     111
     222
     
             sed '
             /1/{
             h
             d
             }
             /2/{
         	G
             }'
    
     Result:
     2
     1
     22
     11
     222
     111
    
     # hold command with delete command is a very common pairing.
    
  8. capital transformation

         	sed '
         	/the .* statement/{
         	h # hold the pattern
         	s/.*the \(.\) statement.*/\1/ # extract the pattern
             y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/ # transform
             G # get the pattern from the hold space
             s/\(.*\)\n(.*the \).*\( statement.*\)/\2\1\3/ # reorder
             }'
    
  9. Correcting index entries

             grep "^\.XX" $* | sort -u |
             sed '
             h
             s/[][\\*.]\\&/g
             x
             s/[\\&]/\\&/g
             s/^\.XX //
             s/$/\//
             x
             s/^\\\.XX \(.*\)$/\/^\\.XX \/s\/\1\//
             G
             s/\n//'
    
             sed '/^\.XX /s/"asterisk (\*) metacharacter"/"asterisk (*) metacharacter"/'
    
  10. Building block of text

            sed '
            /^$!{
                H
                d
                }
            /^$/{
                x
                s/^\n/<p>/
                s/$/<\/p>/
                G
            }'0
    
            sed '
            ${ 
                /^$//{ # It deals with the last line.
                H
                s/^.*$//
                }
            }
            /^$/!{ #this part does not reach the bottom
                H # hold the contents, appending to the existing contents
                d
            }
            /^$/{ # This part reaches the bottom
                x # It swaps the contents of the hold space and the pattern space
                s/^\n/<p>/ # It matches the terminal new line, not the new line in the original contents
                G
            }'
  1. Branch and test

b: branch, t: test

[address]b[label] # The label is optional.

label: up to seven characters, starts with a new line with a colon

:mylabel #no space after the label!

b mylabel

Three branch loops:

1. 
:top
command 1
command 2
/pattern/b top # It returns to the top
command 3 # It only executes command 3 if the pattern doesn't match.

2. 
command 1
/pattern/b end # If it matches /pattern/, command2 will be skipped.
command 2
:end
command 3

3. 
command 1
/pattern/b dothree # wither command 2 or command 3 will be executed
command 2
b
:dothree
command 3

eg:

sed '
	/^\.ES/,/^\.EE/b # It branches when it meets the regex
	s/^"/\'\'/
	s/"$/\'\'/
	s/"? /\'\'? /g
		.
		.
		.
	s/\\(em\\^"/\\(''/g
	s/"\\(em/''\\(em/g
	s/\\(em"/\\(em''/g'
  1. [address]+[label]

    eg, an extractor of index title

    sed '
        /Rh 0/{
        s/"\(.*\)" "\(.*)" "\(.*\)"/"\1" "\2" "\3"/ #1
        t # Label can be added to t.
        s/"\(.*\)" "\(.*)"/"\1" "\2"/ #2
        t
        s/"\(.*\)"/"\1"/ #3 
        }'
    
    # If 1 is true, then it goes to 2;
    # if 2 is true, then it goes to 3.
    
        sed '
        :begin
        /@f1(\([^)]8\))/{
        s//\\fb\/\\fR/g # If it in the same line ??? How to substitue nothing to something?
        b begin
        }
        /@f1(.*/{
        N
        s/@f1(\([^)]*\n[^)]*\))/\\fB/g
        t again
        b again
        } # If it is in multiple lines
        :again
        P
        D'
  1. Join a phrase

        :
        # phrase -- search for words across lines
        # $1 = search string: remaining args = filenames
        search =$1 # It assign search to $1
        shift # shell built-in varible
        for file
        do
        sed '
        /'"$search"'/b # see below
        N # It appends the contents to the next regex.
        h
        s/.*\n// # It remove the previous lines.
        /'"$search"'/b # Without a label, it passes to the bottom, same as the last b command.5
        g # It gets the original contents from the hold space.
        s/ *\n/ / # It converts multi-line to a single line.
        g
        b
        }
        g # It retrieves the duplicates, that preserves the newline, from the hold space. `
        D' $files
        done
    

    /'"$search"'/b # The single quotation prevents shell interpreting it; double quotation makes sure that shell reads it first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment