Unix commands Checklist for OSx

I have been meaning to note down my *nix checklist of commands (For MacOS) which are very handy for basic operations on data. I will modify this post as and when I remember or come across something that fits here. These *nix commands are specifically tested for Mac OS. Uniques
uniq - This is the unix unique function which can be primarily used to remove duplicates from a file amongst other things. The file has to be pre sorted for uniq to work Consider file test which contains the following

$ cat test
aa
bb
bb
cc
cc
cc

Remove duplicates

$uniq test
aa
bb
cc

Count occurences of each item

$ uniq -c test
1 aa
2 bb
3 cc

Print only duplicate items in file

$ uniq -d test
bb
cc

Print only unique lines

$ uniq -u test
aa

Consider test now contains

$cat test
aa
bb
cc
AA
cC

Remove duplicate case insensitive. This file is not sorted though. So it has to be sorted first before uniq. -i flag is for case in sensitive

$ sort test | uniq -i
AA
bb
cC

Sort a fixed width file by a field which begins from 10th byte and ends at 20th

sort -k1.10,1.20 file | head -10

Case conversion
Convert all upper case in fileA to lower case and output as fileB

$ tr '[:upper:]' '[:lower:]' < fileA.txt > fileB.txt

Using tr to replace a string/char in file Convert all carriage returns to newline chars

$ tr '^M' '\n' < input.csv > output.csv

Delete All CR+LF chars from file

$ tr -d '^M\n' < inpfile.txt > outfile.txt

Remove extra spaces in a file

tr -s " " < file.txt > fileout.txt

File comparision
Compare two files and keep strings present in fileA but not in fileB

$ comm -23 fileA fileB

Compare two files and keep strings present in fileB but not in fileA

$ comm -13 fileA fileB

Compare two files and keep only strings which are present in both files

$ comm -3 fileA fileB

Sed
Primary purpose of sed is string replacement or pattern replacement. Consider the following file as input

$ cat file.txt
unix is great os. unix is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.

Replacing or substituting string

$ sed 's/unix/linux/' file.txt
linux is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.

By default, the sed command replaces the first occurrence of the pattern in each line and it won't replace the second, third...occurrence in the line. Here the "s" specifies the substitution operation. The "/" are delimiters. The "unix" is the search pattern and the "linux" is the replacement string. If you miss a delimiter then the expression errors out as below

$ sed 's/unix/linux' file.txt 
sed: 1: "s/unix/linux": unterminated substitute in regular expression

2 Replacing the nth occurrence of a pattern in a line. Use the /1, /2 etc flags to replace the first, second occurrence of a pattern in a line. The below command replaces the second occurrence of the word "unix" with "linux" in a line.

$ sed 's/unix/linux/2' file.txt
unix is great os. linux is opensource. unix is free os.
learn operating system.
unixlinux which one you choose.

Here is the first occurence which is the default option

$ sed 's/unix/linux/1' file.txt
linux is great os. unix is opensource. unix is free os.
learn operating system.
linuxlinux which one you choose.

And the third occurence

$ sed 's/unix/linux/3' file.txt
unix is great os. unix is opensource. linux is free os.
learn operating system.
unixlinux which one you choose.

To replace all the occurence use 'g' (global replacement)

$ sed 's/unix/linux/g' file.txt
linux is great os. linux is opensource. linux is free os.
learn operating system.
linuxlinux which one you choose.

To make the search case insensitive sed on mac does not have a flag but you can use plain regex to achieve it. For example modify the file.txt to below

$ vi file.txt
unix is great os. Unix is opensource. unix is free os.
learn operating system.
Unixlinux which one you choose.
sed 's/[Uu]nix/linux/g' file.txt
linux is great os. linux is opensource. linux is free os.
learn operating system.
linuxlinux which one you choose.

How to find a string in all the files contained in a directory. You could use grep or find.

grep -lr searchStr mydir
grep --recursive --ignore-case --files-with-matches “searchStr" mydir
find mydir -type f | xargs grep -l searchStr

To find/replace multiple strings use the -e flag.

sed -e 's/unix/linux/g' -e 's/Unix/Linux/g' file.txt
linux is great os. Linux is opensource. linux is free os.
learn operating system.
Linuxlinux which one you choose.

To replace a string that begins with a pattern use the regex for it alongwith sed

sed 's/^learn/learn to use/g' file.txt
unix is great os. Unix is opensource. unix is free os.
learn to use operating system.
Unixlinux which one you choose

To remove whitespace characters at end of the line

sed 's/[<spc><tab>]*|/|/g' file.txt

Unix command to know if your file has whitespace or tab characters

vi file.txt
:set list

Unix command to remove BOM (Byte Order Mark) characters from your file Open the file in binary mode using -b flag to verify if you have BOM. And then remove them

vi -b file.txt 
:set nobomb
:wq

Use the -i flag to overwrite the existing file and create a backup of the original file. For example to remove all white spaces in a file.

sed 's/ //g' file.txt
cat file.txt
unixisgreatos.Unixisopensource.unixisfreeos.
learnoperatingsystem.
Unixlinuxwhichoneyouchoose

This will create a backup file called file.txt.bak with the original file contents and overwrite file.txt with no spaces To remove only the trailing spaces in a line use *$. The * character means "any number of the previous character" and $ refers to end of line.

sed -i .bak 's/ *$//g' file.txt

Verify the trailing whitespaces are removed by :set list

vi file.txt
:set list
unix is great os. Unix is opensource. unix is free os.$
learn operating system.$
Unixlinux which one you choose.$

To remove whitespaces between xml tags only.

sed -i .bak -e 's/> *</></g' file.xml

To replace a blank line with something else. You can match a blank line by specifying an end-of-line immediately after a beginning-of-line, i.e. with ^$

vi file.txt
unix is great os. Unix is opensource. unix is free os.
learn operating system.
Unixlinux which one you choose.
sed 's/^$/this used to be a blank line/' file.txt
unix is great os. Unix is opensource. unix is free os.
this used to be a blank line
learn operating system.
Unixlinux which one you choose.

To remove tabs at the end of a line. Ex: Add a tab to the end of first line, so :set list will show ^I

vi file.txt 
unix is great os. Unix is opensource. unix is free os.^I$
learn operating system.$
Unixlinux which one you choose.$

To create a tab in your sed command. use ctrl + v and then ctrl + i

sed -i.bak 's/ *$//' file.txt
vi file.txt
:set list
unix is great os. Unix is opensource. unix is free os.$
learn operating system.$
Unixlinux which one you choose.$

Consider file test which contains the following

$ cat test
(firstname).aa
(firstname).bb
(firstname).bb
(firstname).cc
(firstname).CC
(lastname).hh
(lastname).jj
(lastname).ll

To extract the content after firstname

sed -En 's/.*firstname\)\.([A-Za-z]+).*/\1/p' test
aa
bb
bb
cc
CC

To extract everything before some content

sed -En 's/(.*)somecontent/\1/p' > output.file

sed 's/somecontent.*//'

To split by separator '_' and take the first part

awk -F '_' '{print $1}' file.txt

To add a comma after every word (space separated) in a file

sed -i.bak 's/ /, /g' file.txt

To add a comma at the end of every line in a text file

sed -i'.bak' 's/$/,/g' file.txt

To remove last comma from each line on file

sed -i.bak 's/,$//' File

To remove all double quotes in a file

sed -i'.bak' 's/\"//g' file.txt

To remove all single quotes in a file

sed -i'.bak' "s/'//g" file.txt

To remove everything after first comma in lines of file

awk -F ',' '{print $1}' file.txt  > file_temp.txt && mv file_temp.txt file.txt

or with sed

sed -i.bak 's/,.*$//' file.txt && rm file.txt.bak

To extract everything between first and second comma in a file

awk -F ',' '{print $2}' file.txt

To add a character at beginning of every line in a file

sed -i.bak 's/^/prefix/' file.txt

To add quotes around first word of every line. Here , is the delimiter between words. $1 represents first word is to be selected. & is the content of first word. sub is a substitute function. See here for more details https://superuser.com/questions/664125/unix-surround-first-column-of-csv-with-double-quotes

awk -F, '{sub($1, "\"&\""); print}' file.txt

To copy records from a large file containing a string 'FOO' and adding those records with 'FOO' replaced by 'BAR'. Example:

cat fileA.txt
aaaa
bbb
ccccFOO
ddddFOO

First create another file with BAR records and then merge the two files keeping unique.

sed -i.bak 's/FOO/BAR/gi' fileA.txt

This creates a fileA.txt.bak

cat fileA.txt.bak
aaaa
bbb
ccccBAR
ddddBAR

To verify the correct number of records exists and have been copied. You can use following commands

grep -c 'FOO' fileA.txt
grep -c 'BAR' fileA.txt.bak

Also to get the num lines of each file

wc -l fileA.txt
wc -l fileA.txt.bak

Now merge the two files keeping only unique records.

sort -u fileA.txt fileA.txt.bak > fileA.txt_o | mv fileA.txt_o fileA.txt

Now fileA.txt should have everything. You can use the grep -c and wc -l to verify this file.

cat fileA.txt
aaaa
bbb
ccccBAR
ccccFOO
ddddBAR
ddddFOO

Search Strings
Total occurences of searchStr in current directory

grep -ro searchStr . | wc -l | xargs echo "Total matches :"

Total number of files where searchStr occurs in current directory

grep -lor searchStr . | wc -l | xargs echo "Total matches :"

To get an exact word match use the -w flag.

grep -lwr searchStr mydir

Recursively replace string original with replacement in all files under OSx directory mydir recursively(Excludes hidden files and folders)

find mydir \( ! -regex '.*/\..*' \) -type f -exec sed -i '' 's/original/replacement/g' {} \;

find mydir \( ! -regex '.*/\..*' \) -type f -exec sed -i '' 's/original/replacement/g' {} +

The regex excludes all hidden files and folders which is particularly important if you want to avoid messing up your .DS_Store or .git files unknowningly. if you use zsh then the following would also work

sed -i -- 's/original/replacement/g' **/*(D*)

This isnt exlcuding hidden files though. The **/(D) is basically zsh way of saying recursively go through all sub directories and all files.

Delete all files of a certain type under current directory

find . -name "*.pyc" -exec rm -f {} \;

Replace a string with another string in all files under current directory

find . -name '*.sh' -exec sed -i 's/foo/bar/g' {} \;

find <path-to-directory> -type f -print0 | xargs -0 sed -i 's/foo/bar/g'

Remove everthing after first space in line. (Or extract first word from line)

awk '{ print $1 }' < input > output

Vi see line numbers

:set number

Clip a log file between line numbers

sed -n '105830,106694p;106695q' logile > output

starting line number: 105830,
ending line number: 106694

sajjadintel/unix_commands_mac.md

Clip a log file between line numbers