Created
September 5, 2012 23:49
-
-
Save evandrix/3647970 to your computer and use it in GitHub Desktop.
Unix Commands I Abuse Every Day (everythingsysadmin.com)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1. grep dot (view the contents of files prefixed by their filename) | |
I want to view the contents of a few files but I want each line prepended with the the file's name. My solution? | |
$ grep . *.txt | |
jack.txt:Once upon a time | |
jack.txt:there was a fellow named Jack. | |
lyingryan.txt:Now that "trickle down economics" has been | |
lyingryan.txt:tested for 30 years and the data shows it | |
lyingryan.txt:has been a total failure, candidates | |
lyingryan.txt:still claim that cutting taxes for | |
lyingryan.txt:billionaires will help the economy. | |
market.txt:Jack went to market to sell the family | |
market.txt:cow. | |
market.txt:He came back with a handful of magic beans. | |
$ | |
grep is a search tool. Why am I using it like a weird version of cat? Because cat doesn't have an option to prepend the filename to each line of text. And it shouldn't. | |
Note that "." matches lines with at least 1 character. That is, blank lines are not included. If we change "." (matches any 1 character) to "^" (matches the beginning of a line) then every line will be matched because every line, no matter how short or long, has a beginning! However the period key is easier to type than the caret, at least on my keyboard. Therefore if I don't need the blanks, I don't request them. | |
Example use: The other day I grabbed the /etc/network/interfaces file from 6 different Linux boxes. I needed to review them all. Each was copied to a filename that was the same as the hostname. "grep . *" let me view them all easily and each line was annotated where it came from. | |
2. "more star pipe cat" (cat files with a header between each one) | |
Let's look at another way to accomplish my example of comparing 6 files. In this case I want to print the contents of each file but separate the contents with the file name. Yes, I could do it in a loop: | |
$ for i in *.txt ; do echo === $i === ; cat $i ; done | |
However that takes a lot of typing. | |
This is where I abuse "more". Are you familiar with more? More prints the contents of files but pauses every screenful to ask "More?" Pressing SPACE shows one screenful more. Pressing RETURN shows one line more. | |
When more was new it was very dumb. I had no search functions, you could skip forward a file but not skip back. It assumed your screen size was 24 lines long, heaven forbid you had a short or very long screen. Oh, and if you resized your screen while using it things got confusing. If you piped the output of more to another program things get totally confused because those prompts get sent down the pipe. Certainly the next program in the pipe doesn't expect to see a "More?" every 24 lines. | |
Luckily someone came along and created a replacement for more that fixed all of those problems. Logically these features would all have been added to more and that would have been the end of the story. No, that's not what happened. They wrote a new program from scratch and called it "more 2.0" so we could keep typing "more" but have all those new features... no, that's not what happened. In the grand tradition of Unix having a sense of humor this new program was called "less". Thus begat the famous question, "Do you use more?" "No, I couldn't live without less." | |
Some versions of Unix have the old traditional more and less commands. However in most Linux systems both are the same program but the code detects that it was run as more and goes into "more emulation mode". | |
If you have been using Linux for fewer than 5 years there is a good chance that you didn't know that more existed and quite possibly you were confused why less is called less. Now you know. | |
Which brings us back to our story. Sometimes people get so used to typing "more" that they type it when they mean "cat". For example they type: | |
more * | command | command2 | |
when they mean: | |
cat * | command | command2 | |
Old more would send the prompts to grep which would pass them to sort which would get very confused. You'd have to press SPACE a number of times and, since you didn't see any output, you would usually bang on the keyboard in frustration. It's all a big mess. | |
less is smart enough to detect that its output is going to a pipe and would emulate "cat". This is very smart. | |
Even smarter is that when less is emulating more instead of producing "the big mess" it acts like cat but outputs little headers for each file. | |
$ more * | cat | |
:::::::::::::: | |
jack.txt | |
:::::::::::::: | |
Once upon a time | |
there was a fellow named Jack. | |
:::::::::::::: | |
lyingryan.txt | |
:::::::::::::: | |
Now that "trickle down economics" has been | |
tested for 30 years and the data shows it | |
has been a total failure, candidates | |
still claim that cutting taxes for | |
billionaires will help the economy. | |
:::::::::::::: | |
market.txt | |
::::::::::::::: | |
Jack went to market to sell the family | |
cow. | |
He came back with a handful of magic beans. | |
$ | |
Isn't that pretty? | |
That works on Linux but not on *BSD. However there's a solution that works on both We simply take advantage of the fact that if "head" is given more than one file name it prints a little header in front of each file. However we want to see the entire file, not just the first 10 that head normally shows. No worries. We assume the files are shorter than 99999 lines long and do this: | |
$ head -n 99999 * | |
==> jack.txt <== | |
Once upon a time | |
there was a fellow named Jack. | |
==> lyingryan.txt <== | |
Now that "trickle down economics" has been | |
tested for 30 years and the data shows it | |
has been a total failure, candidates | |
still claim that cutting taxes for | |
billionaires will help the economy. | |
==> market.txt <== | |
Jack went to market to sell the family | |
cow. | |
He came back with a handful of magic beans. | |
$ | |
Note: You can do "head -n 0" on Linux to mean "all lines". However that doesn't work on FreeBSD and other Unixes. (Hey, BSD folks: can you fix that?) | |
3. "grep --color=always '^|foo|bar' | |
As you get older your eyesight gets worse. It becomes more difficult to find something in a field of text. Here's an eye test. Below is a list of recently run jobs on a Ganeti cluster. | |
$ gnt-job list | |
157486 success CLUSTER_VERIFY_CONFIG | |
157487 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) | |
157488 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) | |
157489 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
157712 success OS_DIAGNOSE | |
157779 success CLUSTER_VERIFY | |
157780 success CLUSTER_VERIFY_CONFIG | |
157781 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) | |
157782 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) | |
157783 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
157994 success OS_DIAGNOSE | |
158073 running CLUSTER_VERIFY | |
158074 success CLUSTER_VERIFY_CONFIG | |
158075 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) | |
158076 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) | |
158077 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
158156 success OS_DIAGNOSE | |
158367 success CLUSTER_VERIFY | |
158368 waiting CLUSTER_VERIFY_CONFIG | |
158371 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
158432 waiting OS_DIAGNOSE | |
$ | |
How quickly can you find which is the job that is running? It's kind of burried in there. (The answer is job #158073) | |
The most interesting jobs are the ones that are running and the ones that are waiting to run. It would be nice to have those highlighted. My first instinct was to simply use grep to remove the successful jobs: | |
$ gnt-job list | grep -v success | |
158073 running CLUSTER_VERIFY | |
158368 waiting CLUSTER_VERIFY_CONFIG | |
158432 waiting OS_DIAGNOSE | |
$ | |
However it is useful to see those jobs in context with all the other jobs. What I really want is to have the running and waiting jobs highlighted. Ah! "egrep --color=always" would color the things it finds, right? Ah, but egrep only shows what is found. We get: | |
$ gnt-job list | egrep --color=always 'running|waiting' | |
158073 running CLUSTER_VERIFY | |
158368 waiting CLUSTER_VERIFY_CONFIG | |
158432 waiting OS_DIAGNOSE | |
$ | |
So how can we output every line but also highlight certain words? Well"." matches everything so we could use that, right? No, it matches every single character. We'd just get 100% red text. What else does every line have? It has a beginning! We can add a caret "^" to the regular expression and since the beginning of each line has no length, nothing additional will be highlighted in red. | |
This regular expression matches any line that has a beginning or has the word "running" or has the word "waiting". The matched text will be colored red. | |
$ gnt-job list | egrep --color=always '^|running|waiting' | |
157486 success CLUSTER_VERIFY_CONFIG | |
157487 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) | |
157488 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) | |
157489 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
157712 success OS_DIAGNOSE | |
157779 success CLUSTER_VERIFY | |
157780 success CLUSTER_VERIFY_CONFIG | |
157781 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) | |
157782 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) | |
157783 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
157994 success OS_DIAGNOSE | |
158073 running CLUSTER_VERIFY | |
158074 success CLUSTER_VERIFY_CONFIG | |
158075 success CLUSTER_VERIFY_GROUP(7ee44802-85d3-40fb-bd36-a7e701ecea29) | |
158076 success CLUSTER_VERIFY_GROUP(72a2138c-dc07-494d-bd02-ebff7916c9bc) | |
158077 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
158156 success OS_DIAGNOSE | |
158367 success CLUSTER_VERIFY | |
158368 waiting CLUSTER_VERIFY_CONFIG | |
158371 success CLUSTER_VERIFY_GROUP(457c7377-c83b-4fed-9ebe-a2974e2c521f) | |
158432 waiting OS_DIAGNOSE | |
$ | |
Now you can easily see which jobs are running and waiting and still get the full context. | |
I set up an alias so I can use this command all the time: | |
alias j="gnt-job list | egrep --color=always '^|running|waiting'" | |
Note the careful use of ' within ". | |
If you would like more than just the words "running" and "waiting" highlighted slightly more complex regular expressions are required: | |
Highlight starting at the world, continuing to the end of the file: | |
egrep --color=always '^|running.*$|waiting.*$' | |
Highlight the entire darn line if it has either word in it: | |
egrep --color=always '^|^.* (running|waiting) .*$' | |
Of course, if you are typing these commands instead of using them in a script or alias, the least typing to highlight "foo" and "bar" is: | |
egrep '^|foo|bar' | |
Chances are "--color=auto" is the default and the right thing will happen. If not, add the "--color=always". | |
Note: A co-worker just pointed out that "" matches every line and doesn't result in all text being highlighted. He wins for reducing the regex's to be even smaller. Just remove the "^" at the front: | |
alias j="gnt-job list | egrep --color=always '|running|waiting'" | |
or | |
egrep --color=always '|^.* (running|waiting) .*$' | |
Note2: Someone pointed out that ack will do this with --passthru but ack isn't always on machines I use. | |
4. "fmt -1" (split lines into individual words) | |
If you are not familiar with the fmt command, that's probably because you use a modern text editor like vim or emacs which can do the formatting for you. In the old days we had to call an external command to do our formatting. Back then all Unix commands were small, single-function, tools that could be combined to do great things. Now every new Unix command seems to be trying to have more features than MS-Word. But I digress. | |
"fmt -n" takes text as input and reformats it into nicely shaped paragraphs with no line longer than n. That is, "fmt -65" formats text in nice paragraphs with no line longer than 65 characters. | |
But what if you have a word that is longer than 65 characters? Does it truncate it? No, then you get a line with just that word on it. (Ok, I lied about "no lines longer than n".) | |
So how can we abuse this program? Simple! Suppose we have a bunch of text and want to list out the individual words one per line. Well, words that are "too long" are printed on their own line and we want every word to be printed on its own line. Therefore why don't we tell "fmt" that all words are "too long" by saying we want the paragraphs to be formatted to be 1 character long! | |
$ fmt -1 <fraudulent.txt | |
Fraud | |
is | |
telling | |
a | |
lie | |
that | |
benefits | |
you | |
and | |
not | |
the | |
person | |
or | |
people | |
you | |
tell | |
it | |
to. | |
$ | |
Why would you want to do that? There are plenty of situations where this is useful! | |
Recently I found myself with a long lines of text that mixed usernames and numbers. I wanted to extract out the names. Sure, I could have figured something out with awk or put it into a text editor and copied out the names. Instead I did this: | |
$ fmt -1 <the_file.txt | egrep -v '^[0-9]' | |
fred | |
mary | |
jane | |
bob | |
$ | |
Recently I was curious which IP addresses are mentioned on my wiki: | |
$ cat *.wiki | fmt -1 | egrep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | sort -u | |
192.168.1.4 | |
192.168.1.7 | |
255.255.255.0# | |
255.255.255.192 | |
255.255.255.240 | |
8.3.8.1 | |
<code>172.16.240.1 | |
<code>172.16.240.2 | |
Ok, that's not a perfect list but I was able to do that in a few seconds rather than an hour of writing code. | |
A simple improvement: Transform < and > and a lot of other punctuation into spaces, then delete spaces at the end. | |
$ cat *.wiki | tr "#:@;()<>=,'-\"" ' ' | fmt -1 | egrep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' | tr -d ' ' | sort -u | |
172.16.240.1 | |
172.16.240.2 | |
192.168.1.4 | |
192.168.1.7 | |
255.255.255.0 | |
255.255.255.192 | |
255.255.255.240 | |
8.3.8.1 | |
That's a lot cleaner. 8.3.8.1 is a version number, not an IP address, but this is good enough for a first pass through the list. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment