In this challenge you'll become familiar with a number of standard Unix command utilities and then implement a subset of their functionality in Ruby. You'll gain a better understanding of:
- files/descriptors in unix
- pipes
- sockets
- STDIN, STDERR, STDOUT
If you use Mac OS X or Linux, you're using a variant or flavor of Unix. You might have noticed that doing things on the command line usually involves a collection of small, strangely named programs with impossible to remember options. This can be intimidating, but these small utilities that do one thing well are incredibly powerful, especially when combined with each other.
Most, if not all of the *nix utilities are focused on reading, writing or manipulating text. If files are the backbone of *nix, text is the lingua franca. What's in common with all of these:
- ~/.gitconfig
- .rspec
- sublime preferences
- /private/etc/passwd (/etc/passwd on Linux)
They're all files and they all contain human readable text. Text is pervasive in *nix and one of the reasons is how easy it is to manipulate text with the 'standard' utilities found and most *nix systems.
I mentioned how the real power of *nix utilities becomes apparent when they work together. If you're coming from a web development background you're most familiar with programs or systems working together via APIs. For example, your Rails application can work with Twitter by making requests to Twitter's HTTP API. While connecting to Twitter might seem pretty easy, it still required learning the ins and outs of Twitter's API. LinkedIn's API is a totally different beast and what you learned about Twitter won't apply.
*nix programs communicate with each other over a simple but consistent interface: one or more file descriptors.
A file descriptor in *nix is used to wrap any number of 'things' that can be read from or written to. For example, file descriptors are the interface for reading and writing to:
- files
- sockets (everything from another computer, your postgresql server to a LeapMotion)
- your terminal
- your keyboard
File descriptors support a number of methods, some of the most commonly used include:
- open()
- create()
- read()
- write()
- send()
- seek()
- poll()
By standardizing on file descriptors as the abstract interface for such a wide range of inputs and outputs, we can write programs that are incredibly flexible and don't have to know the implementation details of their inputs and outputs.
In these challenges you'll first become familiar with a standard *nix program, then recreate a subset of its functionality in Ruby.
As you discover new *nix programs you'll need a reference to understand what they do and how to use them. There's a utility called man
which displays the manual page for most utilities. Some programs don't have man pages available. You can read the man page for a command by running: man [program-name]
. For example man ls
will display the instructions for using the ls
command:
LS(1) BSD General Commands Manual LS(1)
NAME
ls -- list directory contents
SYNOPSIS
ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]
DESCRIPTION
For each operand that names a file of a type other than directory, ls
displays its name as well as any requested, associated information.
For each operand that names a file of type directory, ls displays the
names of files contained within that directory, as well as any
requested, associated information.
You can quit man
by typing q
.
We're going to be developing software, so create a new Git repo. Commit your work as you move from one phase of the challenge to the other.
cat
is a program that con_cat_enates streams. Read the man page for cat
and try reading a file or two. How about reading (and printing) your Git config?
Note cat
is usually unnecessary if you aren't concatenating multiple files. See this link for some banter about unnecessary use of cat
: http://partmaps.org/era/unix/award.html#cat
Create a Ruby program (cat.rb
) that reads and prints out a file specified as a command line argument.
$ ruby cat.rb ~/.gitconfig
[user]
name = Linus Torvalds
email = [email protected]
[color]
branch = auto
diff = auto
status = auto
...
When you have a working implementation, be sure to commit your work.
cat
supports reading multiple files. Modify cat.rb
to support reading multiple files:
$ ruby cat.rb ~/.gitconfig ~/.profile
When you can read multiple files, commit your work.
What if a file doesn't exist? Modify cat.rb
quit and print an error message when attempting to read a file that doesn't exists. Note, it should read all files that exist until it attempts to read one that doesn't.
$ ruby cat.rb ~/.gitconfig ~/.profile ~/.foobar
In the above example, your .gitconfig and .profile should print before the error message is printed.
Hidden from you until now, the output you've seen in your terminal is actually the combination of two standard file descriptors: Standard Out (stdout) and Standard Error (stderr). Well behaved programs write the the correct output depending on what they have to print. Programs also have access to Standard In (stdin), used for reading in data. When you read input from the keyboard with gets
, you're reading from stdin.
Each program has it's own stdin, stdout and stderr.
In Ruby, puts
is actually alias to $stdout.puts
. $stdout
is a global variable that like you might expect references the stdout of the current Ruby process.
Like $stdout
, Ruby also has globals for stderr and stdin: $stderr
and $stdin
.
- Modify your program to write the error message to
$stderr
. - Modify your program to quit, with a non-zero exit code. Exit codes are used by other programs to determine whether or not another program completed successfully.
Try confirming your exit code is working as expected by running:
ruby cat.rb foobar && echo "Previous command succeeded"
You shouldn't see the output "Previous command succeeded"
Great *nix programs can work the small amounts of data just as efficiently as large ones. grep
can search a multi-gigabyte file for the word "cats" without breaking a sweat. Our cat.rb
program should be just as powerful.
How does your implementation of cat
read the input files? Are you reading the entire contents of each file all at once? If so, think of where the contents of that file are temporarily stored. Is that the best you can do?
Modify your program to read and write efficiently. Think of how you can ensure you don't use too much memory while your program is running. If you're unsure, refer back to the man page for cat
.
Time to take this program to the next level! Reading from stdin is a standard feature of most *nix programs and cat
is no different. Checking the man page there are two ways to read in from stdin with cat
. The first is not providing a filename, cat
will read from stdin by default.
$ cat
Hello World!
Hello World!
In the above example, my keyboard input was captured and sent to the cat
's stdin. cat
did what cat
does, it printed the output to the screen.
The other way to request that cat
read from stdin is to specify -
as the filename. This is a common convention in the *nix world:
$ cat -
Hello World!
Hello World!
Note: When you're providing input via stdin you can 'close' the input with CRTL+D. This is different than CTRL+C, which quits the program.
Modify your program to support reading from stdin, just like cat
does.
cat -n
will prefix each line with a line number:
$ cat -n ~/.gitconfig
1 [user]
2 name = Nate Delage
3 email = [email protected]
4
5 [color]
6 branch = auto
7 diff = auto
8 status = auto
Modify your program to support this flag.
Can you make your cat.rb
support this use case?
$ ruby cat.rb foo.txt - bar.txt -
The above should read from each input until End of File (EOF). Reading from stdin twice. Remember, you can close stdin with CTRL-D
Up until now we haven't written any tests. Writing tests for input & output might not come naturally. In the past you might have gone as far to stub out calls to gets
or puts
. I won't give you any solutions, but here are a few clues that might help:
- $stdin, $stdout and $stderr might be global variables, but they aren't constants. You can reassign them as you please.
- StringIO
Write some tests for cat.rb
, especially the functionality for reading from stdin.
Use the *nix tool time
(read the man
page) to benchmark the runtime of cat
and your program cat.rb
. Which one is faster? By how much? Try with several large files.
There's a benchmarking library benchmark
built into Ruby. Try using that to benchmark cat.rb
. Are the numbers any different that what you saw with time
? Why? Can you improve upon this time? Hint: check the man
page for ruby
.
At the moment you always need to include ruby
on the command line to run your program. Make your program executable so that you can run it like so:
./cat.rb foo.txt
What commands do you need to make cat.rb
executable?
tail
is another incredibly useful utility in *nix. Read the man
page and practice using a number of tail
's options.
Write your own version of tail
in Ruby. The following command should print the last 10 lines of foo.txt
:
$ ./tail.rb foo.txt
Add support for the -n
flag to your tail.rb
program.
tail -f
is a great way to keep a close eye on log files. Create a small utility that prints a log of simulated http requests:
$ ./server_log.rb
GET / 200 104ms
GET /contact/new 200 80ms
POST /contact 201 301ms
GET /about 200 91ms
Your utility server_log.rb
should continuously produce output. Pause a random number of ms between lines between printing a random request. Be sure to separate fields in the log file with a tab character.
Try piping the output of server_log.rb
to tail.rb
:
$ ./server_log.rb | ./tail.rb
Does it work? Why not? What if you open another terminal and kill the server_log.rb
process. Do you see any output then?
Try modifying your server_log.rb
program so that it writes to a file instead of stdout. Then tail.rb
that file. Does that work as expected? What's the difference?
Hardcoding server_log.rb
to write to a file might be a bit limiting. Read about [redirection](http://www.comp.leeds.ac.uk/jj/linux/cli2.html#Redirecting STDOUT) and instead of writing server_log.rb
's output to a file in your Ruby code, write to a file via redirection. Then try using tail.rb
again.
Are you familiar with the |
(pipe) character in the previous example? If not, do some research and see what you can learn. It's a great example of chaining *nix commands together. Though, as you've seen, it doesn't work that great with tail
.
Compare the output and behavior between your version of tail
and the standard implementation. Can you spot any differences?
While the default units for tail
is lines, tail
also supports reading n-blocks of 512 bytes with the -b
flag.
Before you add the -b
flag to tail.rb
, find out how much data is in a byte. How is a byte related to a bit? If we assume we're reading UTF-8 encoding files, how many characters should print if we specify -b 1
?
Add support for the -b
flag to your tail.rb
program.