sed and awk notes

##AWK notes##

selective printing
```
 awk '$2 ~ regex, { $1="", pring $0}' 
```

If $2 = regex, then print the whole line but not $1

convert a single line to multiple lines

 awk -F, '{
     		print $4 ", " $0 #print $4 & $0 
 	        }' $* |


     sort |
     awk -F, '
     $1 == LastState { # If $1 = LastState
         print "\t" $2 # print $2 with a leading tab
     $1 != LastState {
         LastState = $1 # reset the LastState
     print $1
     print "\t" $2}'

##Regex notes##

    can[ no']* t# match cant, can't, cannot, can not

^$, #blank line

 ^ *$ # blank line with white space
 ^.*$ # blank line with white space

{m, n} # to specify the range in sed/grep, 0 ≤ m, n ≤ 256

##SED notes##

Basic sed substitution structure:

sed '[address]s/pattern/replacement/flag/' # the delimiter can be any character

specifying the address

 sed '/regex/s/regex2/reges3/' # substitute regex2 with regex3 in the line containing regex1

delete

 sed 1d # delete the first line

 sed $d # delete the last line

 sed '50, $d' # delete from the 50th line to the end 

 sed '1, /^$/d' # delete the first line to the first blank line

 sed '1, /^$/!d' # delete all lines except from the first line to the first blank line

 sed '[address]/regex/d' # delete the whole line in the position of address containing regex

 sed 's/^[ |TAB]* //g' # delete all leading space or tabs

spacing

 sed '/^+  * /d' # delete the leading space

 sed '/  * // /g' # substitute additional spaces between words
 
 sed '/\.  * /.cc/g' # delete the spaces behind a period

suppressing

 sed -n '/regex1,/regex2/p' # print the pattern between regex1 and regex2

 sed -n '
     s/regex1/regex2/p
     s/regex2/regex3/p' # this shows how sed works

hyphen

Cf:

sed 's/--/\\(em/g' # replace all hyphens

sed '/---/!s/--\\(em/g' # replace all two-dash hyphen, but not three-dash hyphen

Extracting spaces at the end of a line is not permitted.
Without -g flag, sed only deals with the first occurrence.

order of occurrence

 sed 's/ />/2' # replace the second space with >

 sed 's/ /\
 /2	# replace the second space with a new line

&, \n, \ are meta data in the replacement.
( ) are used to specify the range.
sort -u # remove all duplicate lines

append, insert, change

sed '[address]a, i, c\
text

sed	'/From, 1/,/^$/{ #
        s/From //p
        c\
    <Mail Header Removed>
    }

list

 sed -n -e "l" FILE # -n suppress data, -e the next pattern is a command

debugging

 sed '[address]{
     p	# print the original line
     s/regex1/regex2/p # print the modified line
     }'

 sed '[address][
     =	# print the line number
     p	# print the line
     }

#Sed Notes II##

Multiple-line pattern space

 N, D, P: multiple line
 n, d, p: single line

         sed '
         /operator/{
         N
         s/Owner and Operator\nGUIde /Installation GuIde #\n means to insert a new line, 
         }' #but new line cannot be put in the replacement

         sed '
         /operator/{
         N
         s/Owner and operator\nGuide/Installation Guide\
         /
         }'

         sed  '
         /Owner/{
         N
         s/Owner *\n*and *\n*Operator *\n*Guide /Installation Guide # *\n* means the new line is optional
         }'

         sed '
         s/Owner and Operator Guide/Installation Guide/
         /Owner/{
         N
         s/ *\n/ /
         s/Owner and Operator Guide */Installation Guide\
         /
         }'

$!N # It excludes the end of the line from N command

A script for extracting figures

         sed '
         /<para>/{ # when <para> occurs
         N
         C\ # Change the multiple lines to
         .LP
         }
         /<Figure Begin>/,/<Figure End>/>{ # between <Figure Begin> and <Figure End>
 		    	w fig.interleaf # write to file fig.interleaf
         /<Figure End>/i\ # insert the following
             .FG\
             <insert figure here>\
             .FE
 	        d # delete the original patterns
 	        }
         /^$/d' # delete blank lines

multi-line deletion

Cf.

         sed '
         /^$/{
         N
         /^\n$/d
         }' # only removes the additional blank lines in odd lines

and

         sed /^\n$/D # removes all additional blank lines (It deletes at least two blank lines that occur together)

Multi-line print

Cf:

         sed '
         /UNIX$/{
         N # append a new input
         s/\nSystem/Operating &/ # & means the regex 1
         P # print the modified line
         D # prevent the repeating
         }

and

         sed '
         /UNIX$/{
             N
             /\nSystem/{
                 s// Operating &/
                 P
                 D
             }
         }'

         sed 's/@f1(\(.*\))/\\fB\1\\fR/g'
 	
         # The original file use @f1 to bold the text, so the script use fB and fR to do so.
 	
         # \1\ matches the pattern to be kept \(.*\)
 
         sed 's/@f1(\([^)]*\))/\\fB\1\\fR/g'
             # use [^)] to replace .*

         sed '
         s/@f1(\([^)]*\))/\\fB\1\\fR/g
         /@f1(.*/{
         N
         s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
         }'
 
         sed '
         s/@f1(\([^)]*\))/\\fB\1\\fR/g
         /@f1(.*/{
         N
         s/@f1(\(.*\n[^)]*\))/\\fB\1\\fR/g
         P
         D
         }' 
         # Once making a substitution across two lines, 
         # print the first line and then delete it.
         # With the second portion remining in the pattern space,
         # control passes on the top of the script where we see
         # if there is an @f1 remaining on the line.

Hold that line

h, H # Hold: copy or append contents of pattern space to hold space.

g, G # Get: copy or append contents of hold space to pattern space.

x # Exchange: Swap contents of hold space and pattern space.

the lowercase commands overwrite the contents of the target buffer,

the uppercase commands append to the existing contents.
```
 Trial:

 1
 2
 11
 22
 111
 222
 
         sed '
         /1/{
         h
         d
         }
         /2/{
     	G
         }'

 Result:
 2
 1
 22
 11
 222
 111

 # hold command with delete command is a very common pairing.
```

capital transformation

     	sed '
     	/the .* statement/{
     	h # hold the pattern
     	s/.*the \(.\) statement.*/\1/ # extract the pattern
         y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/ # transform
         G # get the pattern from the hold space
         s/\(.*\)\n(.*the \).*\( statement.*\)/\2\1\3/ # reorder
         }'

Correcting index entries

         grep "^\.XX" $* | sort -u |
         sed '
         h
         s/[][\\*.]\\&/g
         x
         s/[\\&]/\\&/g
         s/^\.XX //
         s/$/\//
         x
         s/^\\\.XX \(.*\)$/\/^\\.XX \/s\/\1\//
         G
         s/\n//'

         sed '/^\.XX /s/"asterisk (\*) metacharacter"/"asterisk (*) metacharacter"/'

Building block of text

        sed '
        /^$!{
            H
            d
            }
        /^$/{
            x
            s/^\n/<p>/
            s/$/<\/p>/
            G
        }'0

            sed '
            ${ 
                /^$//{ # It deals with the last line.
                H
                s/^.*$//
                }
            }
            /^$/!{ #this part does not reach the bottom
                H # hold the contents, appending to the existing contents
                d
            }
            /^$/{ # This part reaches the bottom
                x # It swaps the contents of the hold space and the pattern space
                s/^\n/<p>/ # It matches the terminal new line, not the new line in the original contents
                G
            }'

Branch and test

b: branch, t: test

[address]b[label] # The label is optional.

label: up to seven characters, starts with a new line with a colon

:mylabel #no space after the label!

b mylabel

Three branch loops:

1. 
:top
command 1
command 2
/pattern/b top # It returns to the top
command 3 # It only executes command 3 if the pattern doesn't match.

2. 
command 1
/pattern/b end # If it matches /pattern/, command2 will be skipped.
command 2
:end
command 3

3. 
command 1
/pattern/b dothree # wither command 2 or command 3 will be executed
command 2
b
:dothree
command 3

eg:

sed '
	/^\.ES/,/^\.EE/b # It branches when it meets the regex
	s/^"/\'\'/
	s/"$/\'\'/
	s/"? /\'\'? /g
		.
		.
		.
	s/\\(em\\^"/\\(''/g
	s/"\\(em/''\\(em/g
	s/\\(em"/\\(em''/g'

[address]+[label]

eg, an extractor of index title

sed '
    /Rh 0/{
    s/"\(.*\)" "\(.*)" "\(.*\)"/"\1" "\2" "\3"/ #1
    t # Label can be added to t.
    s/"\(.*\)" "\(.*)"/"\1" "\2"/ #2
    t
    s/"\(.*\)"/"\1"/ #3 
    }'

# If 1 is true, then it goes to 2;
# if 2 is true, then it goes to 3.

        sed '
        :begin
        /@f1(\([^)]8\))/{
        s//\\fb\/\\fR/g # If it in the same line ??? How to substitue nothing to something?
        b begin
        }
        /@f1(.*/{
        N
        s/@f1(\([^)]*\n[^)]*\))/\\fB/g
        t again
        b again
        } # If it is in multiple lines
        :again
        P
        D'

Join a phrase

    :
    # phrase -- search for words across lines
    # $1 = search string: remaining args = filenames
    search =$1 # It assign search to $1
    shift # shell built-in varible
    for file
    do
    sed '
    /'"$search"'/b # see below
    N # It appends the contents to the next regex.
    h
    s/.*\n// # It remove the previous lines.
    /'"$search"'/b # Without a label, it passes to the bottom, same as the last b command.5
    g # It gets the original contents from the hold space.
    s/ *\n/ / # It converts multi-line to a single line.
    g
    b
    }
    g # It retrieves the duplicates, that preserves the newline, from the hold space. `
    D' $files
    done

/'"$search"'/b # The single quotation prevents shell interpreting it; double quotation makes sure that shell reads it first.

illy/sed and awk notes.md