BASH and *NIX learnings

Learning Bash and Unix as I go. This file documents some of the things I learn along the way in a scratchpad fashion.

Documentation

http://www.pixelbeat.org/programming/shell_script_mistakes.html
http://mywiki.wooledge.org/BashPitfalls
https://dev.to/thiht/shell-scripts-matter
https://github.com/j1elo/shell-snippets/blob/master/template.sh
https://natelandau.com/boilerplate-shell-script-template/
https://www.linuxjournal.com/content/bash-trap-command
http://mywiki.wooledge.org/BashFAQ/031 The difference between '[' and '[[' in conditionals.

Command line

STDIN/STDOUT/STDERR

There are three "standard stream" through which bits flow between the computer and it's environment (Physical IO)

0 = STDIN 1 = STDOUT 2 = STDERR

https://en.wikipedia.org/wiki/Standard_streams

These are abstracted in a text terminal running in a shell on a modern computer.

You can redirect and pipe streams connecting the output of one process as the input to the next.

# cat = print a file to STDOUT; pipe STDOUT to STDIN for wc -l = count the number of lines 
# from STDIN and print to STDOUT
$ cat file.txt | wc -l

# Equals the above command. However, instead of STDIN, read from FILE and print to STDOUT 
$ wc -l file.txt

cmd < FILE means use the content of FILE as STDIN, instead of whatever you type on the keyboard
cmd > FILE means redirect the data from STDOUT and write it to a FILE, instead of the screen (terminal)
cmd < FILE1 > FILE2 combines both, read from FILE1 and write to FILE2
cmd >> FILE means redirect the data from STDOUT and append it to the end of FILE, instead of the screen (terminal).

You could also use HERE documents. This will take a literal, and treat it as if it were a separate file. You could redirect a HERE document to SDTIN, or use it as a FILE argument

https://en.wikipedia.org/wiki/Here_document

Process management

Print the exit status of the last command that was executed.

echo $?

Execute a command and keep a log file of the STDOUT output:

ping google.com 2>ping.log

Which is helpful if we want to execute ping in the background and keep it going:

nohup ping google.com 2>ping.log &

nohup makes a running process immune to HUP (hangup) signals. That is, the SIGH UP signal from logout
the & makes the command run as a background process.

More info about redirection: https://en.wikipedia.org/wiki/Redirection_(computing)

Maybe I forgot to put a program in the background. No worries. Press CTRL-Z in the terminal to get a prompt. This will send a SIGTSTP signal to a foreground program, suspending it effectively. If we press CTRL-C, we send a SIGINT signal which interrupts the foreground program, effectively aborting/halting execution.

bg
disown -a

This will resume the suspended program in the background, as if it was started with &. The disown command will remove the job from the shell's joblist, making sure it won't receive a SIGHUP when the terminal closes. However, the job stays connected to the terminal, so when the terminal is closed, the job might fail if it tries to read from STDIN or write to STDOUT.

See: https://unix.stackexchange.com/questions/3886/difference-between-nohup-disown-and

zcat

It's like cat but then through .gz files.

zcat collection.tar.gz

grep

Inverting a search for a string: that is, find all lines that do NOT match this pattern:

grep -v ".*string"

Recursively look for files in the cwd and it's children, matching this pattern:

grep -r ".*string" .

Grep using a regex:

grep -G '^apple' fruits.txt

Grep using a regex incoming from a file. Note: our regex.txt file can hold multiple regexes to match each line against:

grep -Gf regex.txt fruits.txt

Suppose our regex file is a CSV with columns, delimited by a space. And we want to use the first column as a regex:

grep -Gf <(cut -d ',' -f 1 regex.txt) fruits.txt

Note that between < and ( is NO space. Bash will complain if you leave a space.
The block between () is executed as a subcommand separately. The STDOUT of that block is then redirected to grep for the -Gf switches
The cut command: -d defines delimiter, and -f 1 means extract first column

We can manipulate the regex before sending it to grep with sed:

grep -Gf <(cut -d ',' -f 1 regex.txt | sed -e 's/^/^/') fruits.txt

This will add a ^ in front of the regex.

Let's do something with ISO-936 codes. Given these CSV data, count the number of ISO-936-1 codes

ISO_639_2,ISO_639_1,eng_label,fre_label
(none)/,(none),Serbo-Croatian,serbo-croate
aar/,aa,Afar,afar
abk/,ab,Abkhazian,abkhaze
ace/,,Achinese,aceh
ach/,,Acoli,acoli
ada/,,Adangme,adangme
ady/,,Adyghe; Adygei,adyghé
afa/,,Afro-Asiatic languages,"afro-asiatiques, langues"
afh/,,Afrihili,afrihili
afr/,af,Afrikaans,afrikaans
ain/,,Ainu,aïnou
aka/,ak,Akan,akan
akk/,,Akkadian,akkadien
alb/sqi,sq,Albanian,albanais

BASH command:

cut -d ',' -f 2 iso-936.csv | grep -v -e "^\s*$" | wc -l

cut the file and take the 2nd column containing the ISO-936-1 codes
grep -v => fetch everything that does NOT match this string
grep -e => match this regular expression pattern
"^\s*$" => match all strings that are 0 or more whitespaces (empty string, or strings with whitespace)
So, basically, an inverted search on "empty strings" will filter out values.
wc -l => count the lines

sed

sed -e "s/.*\$\$a([^)][^)]*)\([^\$][^\$]*\).*/\1/"

sed will do a find/replace based on pattern matching from STDIN to STDOUT. Note the [^\$][^\$]* and [^)][^)]* for respectively matching literal $ and ) characters. We need to repeat the pattern because sed doesn't do PCRE regex unless you add the -E flag (which is an undocumented alias for -r.

See: https://www.gnu.org/software/sed/manual/sed.html#BRE-vs-ERE

sed -e "/^\$/d"

This will delete all empty lines from a given input (file or stdin)

see: https://www.cyberciti.biz/faq/using-sed-to-delete-empty-lines/

sort / uniq

With a list of items in a file fruits.txt:

apple
apple
orange
banana

Let's do an aggregated count:

cat fruits.txt | sort | uniq -c | sort -n
      1 banana
      1 orange
      2 apple

xargs

xargs -n 1 -a fruits.txt echo

xargs will take a bunch of input and map them to the argumenst of a command, one by one. We can even do explicit substitution. In this example, FOO is replaced by the output of xargs through the -I switch:

xargs -n 1 -a -I FOO -a fruits.txt echo FOO

awk

Let's compare two files and filter file A with the values from file B. $1 is the first column in both files:

awk 'NR==FNR{a[$1];next} ($1) in a' file.b.txt file.a.txt

netsensei/gist:9afb35148b947dad423092d1579b1b9b