Unix Challenge

Background

In this challenge you'll become familiar with a number of standard Unix command utilities and then implement a subset of their functionality in Ruby. You'll gain a better understanding of:

files/descriptors in unix
pipes
sockets
STDIN, STDERR, STDOUT

Unix tools and their philosophy

If you use Mac OS X or Linux, you're using a variant or flavor of Unix. You might have noticed that doing things on the command line usually involves a collection of small, strangely named programs with impossible to remember options. This can be intimidating, but these small utilities that do one thing well are incredibly powerful, especially when combined with each other.

Most, if not all of the *nix utilities are focused on reading, writing or manipulating text. If files are the backbone of *nix, text is the lingua franca. What's in common with all of these:

~/.gitconfig
.rspec
sublime preferences
/private/etc/passwd (/etc/passwd on Linux)

They're all files and they all contain human readable text. Text is pervasive in *nix and one of the reasons is how easy it is to manipulate text with the 'standard' utilities found and most *nix systems.

Working together

I mentioned how the real power of *nix utilities becomes apparent when they work together. If you're coming from a web development background you're most familiar with programs or systems working together via APIs. For example, your Rails application can work with Twitter by making requests to Twitter's HTTP API. While connecting to Twitter might seem pretty easy, it still required learning the ins and outs of Twitter's API. LinkedIn's API is a totally different beast and what you learned about Twitter won't apply.

*nix programs communicate with each other over a simple but consistent interface: one or more file descriptors.

File descriptors

A file descriptor in *nix is used to wrap any number of 'things' that can be read from or written to. For example, file descriptors are the interface for reading and writing to:

files
sockets (everything from another computer, your postgresql server to a LeapMotion)
your terminal
your keyboard

File descriptors support a number of methods, some of the most commonly used include:

open()
create()
read()
write()
send()
seek()
poll()

By standardizing on file descriptors as the abstract interface for such a wide range of inputs and outputs, we can write programs that are incredibly flexible and don't have to know the implementation details of their inputs and outputs.

Challenges

In these challenges you'll first become familiar with a standard *nix program, then recreate a subset of its functionality in Ruby.

Prerequisite: `man`

As you discover new *nix programs you'll need a reference to understand what they do and how to use them. There's a utility called man which displays the manual page for most utilities. Some programs don't have man pages available. You can read the man page for a command by running: man [program-name]. For example man ls will display the instructions for using the ls command:

LS(1)                     BSD General Commands Manual                    LS(1)

NAME
     ls -- list directory contents

SYNOPSIS
     ls [-ABCFGHLOPRSTUW@abcdefghiklmnopqrstuwx1] [file ...]

DESCRIPTION
     For each operand that names a file of a type other than directory, ls
     displays its name as well as any requested, associated information.
     For each operand that names a file of type directory, ls displays the
     names of files contained within that directory, as well as any
     requested, associated information.

You can quit man by typing q.

Prerequisite: `git`

We're going to be developing software, so create a new Git repo. Commit your work as you move from one phase of the challenge to the other.

Part One: cat

cat is a program that con_cat_enates streams. Read the man page for cat and try reading a file or two. How about reading (and printing) your Git config?

Note cat is usually unnecessary if you aren't concatenating multiple files. See this link for some banter about unnecessary use of cat: http://partmaps.org/era/unix/award.html#cat

Challenge 1: Reading a single file

Create a Ruby program (cat.rb) that reads and prints out a file specified as a command line argument.

$ ruby cat.rb ~/.gitconfig
[user]
	name = Linus Torvalds
	email = [email protected]

[color]
  branch = auto
  diff   = auto
  status = auto
...

When you have a working implementation, be sure to commit your work.

Challenge 2: Reading n files

cat supports reading multiple files. Modify cat.rb to support reading multiple files:

$ ruby cat.rb ~/.gitconfig ~/.profile

When you can read multiple files, commit your work.

Challenge 3 - Handling errors

What if a file doesn't exist? Modify cat.rb quit and print an error message when attempting to read a file that doesn't exists. Note, it should read all files that exist until it attempts to read one that doesn't.

$ ruby cat.rb ~/.gitconfig ~/.profile ~/.foobar

In the above example, your .gitconfig and .profile should print before the error message is printed.

Challenge 4 - Following convention

Hidden from you until now, the output you've seen in your terminal is actually the combination of two standard file descriptors: Standard Out (stdout) and Standard Error (stderr). Well behaved programs write the the correct output depending on what they have to print. Programs also have access to Standard In (stdin), used for reading in data. When you read input from the keyboard with gets, you're reading from stdin.

Each program has it's own stdin, stdout and stderr.

In Ruby, puts is actually alias to $stdout.puts. $stdout is a global variable that like you might expect references the stdout of the current Ruby process.

Like $stdout, Ruby also has globals for stderr and stdin: $stderr and $stdin.

Modify your program to write the error message to $stderr.
Modify your program to quit, with a non-zero exit code. Exit codes are used by other programs to determine whether or not another program completed successfully.

Try confirming your exit code is working as expected by running:

ruby cat.rb foobar && echo "Previous command succeeded"

You shouldn't see the output "Previous command succeeded"

Challenge 5 - Performance

Great *nix programs can work the small amounts of data just as efficiently as large ones. grep can search a multi-gigabyte file for the word "cats" without breaking a sweat. Our cat.rb program should be just as powerful.

How does your implementation of cat read the input files? Are you reading the entire contents of each file all at once? If so, think of where the contents of that file are temporarily stored. Is that the best you can do?

Modify your program to read and write efficiently. Think of how you can ensure you don't use too much memory while your program is running. If you're unsure, refer back to the man page for cat.

Challenge 6 - stdin

Time to take this program to the next level! Reading from stdin is a standard feature of most *nix programs and cat is no different. Checking the man page there are two ways to read in from stdin with cat. The first is not providing a filename, cat will read from stdin by default.

$ cat
Hello World!
Hello World!

In the above example, my keyboard input was captured and sent to the cat's stdin. cat did what cat does, it printed the output to the screen.

The other way to request that cat read from stdin is to specify - as the filename. This is a common convention in the *nix world:

$ cat -
Hello World!
Hello World!

Note: When you're providing input via stdin you can 'close' the input with CRTL+D. This is different than CTRL+C, which quits the program.

Modify your program to support reading from stdin, just like cat does.

Challenge 7 - Line numbers

cat -n will prefix each line with a line number:

$ cat -n ~/.gitconfig
     1	[user]
     2		name = Nate Delage
     3		email = [email protected]
     4
     5	[color]
     6	  branch = auto
     7	  diff   = auto
     8	  status = auto

Modify your program to support this flag.

Challenge 8 - stdin & files

Can you make your cat.rb support this use case?

$ ruby cat.rb foo.txt - bar.txt -

The above should read from each input until End of File (EOF). Reading from stdin twice. Remember, you can close stdin with CTRL-D

Challenge 9 - Testing

Up until now we haven't written any tests. Writing tests for input & output might not come naturally. In the past you might have gone as far to stub out calls to gets or puts. I won't give you any solutions, but here are a few clues that might help:

$stdin, $stdout and $stderr might be global variables, but they aren't constants. You can reassign them as you please.
StringIO

Write some tests for cat.rb, especially the functionality for reading from stdin.

Challenge 10 - Benchmarking

Use the *nix tool time (read the man page) to benchmark the runtime of cat and your program cat.rb. Which one is faster? By how much? Try with several large files.

There's a benchmarking library benchmark built into Ruby. Try using that to benchmark cat.rb. Are the numbers any different that what you saw with time? Why? Can you improve upon this time? Hint: check the man page for ruby.

Challenge 11 - Executable

At the moment you always need to include ruby on the command line to run your program. Make your program executable so that you can run it like so:

./cat.rb foo.txt

What commands do you need to make cat.rb executable?

Part Two: tail

tail is another incredibly useful utility in *nix. Read the man page and practice using a number of tail's options.

Challenge 1 - Tell me the end of the story

Write your own version of tail in Ruby. The following command should print the last 10 lines of foo.txt:

$ ./tail.rb foo.txt

Challenge 2 - How many lines?

Add support for the -n flag to your tail.rb program.

Challenge 3 - Simulate a log

tail -f is a great way to keep a close eye on log files. Create a small utility that prints a log of simulated http requests:

$ ./server_log.rb
GET / 200 104ms
GET /contact/new 200 80ms
POST /contact 201 301ms
GET /about 200 91ms

Your utility server_log.rb should continuously produce output. Pause a random number of ms between lines between printing a random request. Be sure to separate fields in the log file with a tab character.

Challenge 4 - tail stdin

Try piping the output of server_log.rb to tail.rb:

$ ./server_log.rb | ./tail.rb

Does it work? Why not? What if you open another terminal and kill the server_log.rb process. Do you see any output then?

Try modifying your server_log.rb program so that it writes to a file instead of stdout. Then tail.rb that file. Does that work as expected? What's the difference?

Hardcoding server_log.rb to write to a file might be a bit limiting. Read about [redirection](http://www.comp.leeds.ac.uk/jj/linux/cli2.html#Redirecting STDOUT) and instead of writing server_log.rb's output to a file in your Ruby code, write to a file via redirection. Then try using tail.rb again.

Are you familiar with the | (pipe) character in the previous example? If not, do some research and see what you can learn. It's a great example of chaining *nix commands together. Though, as you've seen, it doesn't work that great with tail.

Compare the output and behavior between your version of tail and the standard implementation. Can you spot any differences?

Challenge 5 - bytes

While the default units for tail is lines, tail also supports reading n-blocks of 512 bytes with the -b flag.

Before you add the -b flag to tail.rb, find out how much data is in a byte. How is a byte related to a bit? If we assume we're reading UTF-8 encoding files, how many characters should print if we specify -b 1?

Add support for the -b flag to your tail.rb program.

ndelage/unix_challenge.md

Unix Challenge

Background

Unix tools and their philosophy

Working together

File descriptors

Challenges

Prerequisite: `man`

Prerequisite: `git`

Part One: cat

Challenge 1: Reading a single file

Challenge 2: Reading n files

Challenge 3 - Handling errors

Challenge 4 - Following convention

Challenge 5 - Performance

Challenge 6 - stdin

Challenge 7 - Line numbers

Challenge 8 - stdin & files

Challenge 9 - Testing

Challenge 10 - Benchmarking

Challenge 11 - Executable

Part Two: tail

Challenge 1 - Tell me the end of the story

Challenge 2 - How many lines?

Challenge 3 - Simulate a log

Challenge 4 - tail stdin

Challenge 5 - bytes

Part Three: wc

Part Four: cut

ndelage/unix_challenge.md

Unix Challenge

Background

Unix tools and their philosophy

Working together

File descriptors

Challenges

Prerequisite: man

Prerequisite: git

Part One: cat

Challenge 1: Reading a single file

Challenge 2: Reading n files

Challenge 3 - Handling errors

Challenge 4 - Following convention

Challenge 5 - Performance

Challenge 6 - stdin

Challenge 7 - Line numbers

Challenge 8 - stdin & files

Challenge 9 - Testing

Challenge 10 - Benchmarking

Challenge 11 - Executable

Part Two: tail

Challenge 1 - Tell me the end of the story

Challenge 2 - How many lines?

Challenge 3 - Simulate a log

Challenge 4 - tail stdin

Challenge 5 - bytes

Part Three: wc

Part Four: cut

Prerequisite: `man`

Prerequisite: `git`