Skip to content

Instantly share code, notes, and snippets.

@julik
Created April 8, 2013 08:07
Show Gist options
  • Save julik/5335072 to your computer and use it in GitHub Desktop.
Save julik/5335072 to your computer and use it in GitHub Desktop.
Le fork and forget picked up on Pastebin
#! /usr/bin/env ruby
# "Fork and Forget"
# Don't wait if you don't have to: A mini-tutorial about concurrency
# mechanisms in Ruby and basic Unix systems programming, and how you can use
# them to avoid waiting.
#
# I have heard that people are occasionally unfamiliar with this strategy.
# It's a common idiom, regardless of language, and it is also essentially built
# into Erlang (and Termite Scheme, etc.).
#
# If you have a thing that takes forever and your program doesn't care so much
# about its output (or prefers to collect it later through some other means,
# like a pipe/file/DB/etc.), then this is probably the thing you want to do,
# rather than stopping execution to wait.
#
# The relevant methods (fork, wait, waitpid, etc.) are essentially just
# wrappers around the standard Unix system calls of the same name, and have
# almost the same semantics, although they're often nicer to use in Ruby than
# in C.
#
# Ruby:
# Process.waitpid(fork {
# do_a_thing!
# })
#
# C:
# if(pid = fork()) {
# do_a_thing();
# } else {
# waitpid(pid, NULL, 0);
# }
#
# And so, without further ado, let's get started.
require 'open-uri'
# A quick utility.
class File
def self.append fn, str
(open(fn, 'a') << str).close
end
end
# Fibonacci. The naive, recursive version is great for pegging the CPU and
# simple to implement.
def fib n
return 1 if n < 2
fib(n - 1) + fib(n - 2)
end
# We stuff the results into a file. IO.popen is another good way to grab the
# results if the current process wants to handle the data itself.
# Contrived.
def io_bound url, output
File.append output, open(url).read
end
# Contrived, way contrived.
def cpu_bound n, output
File.append output, "#{fib(n)}\n"
end
# Now we get to the interesting bits about Unix processes, Ruby threads, and
# concurrency.
# Where do processes go when they die? They become zombies. (Unix terminology
# is best terminology.) The status will show up as a "Z" in the output of
# top(1) or ps(1). The main reason they hang out like this is to allow the
# parent access to their status on exit. If the parent exits before they do,
# though, the processes become children of the init(8) process. init
# automatically reaps all of its children that become zombies.
# This is a fairly simple loop that waits to reap all of the child processes
# that have exited. They'll all run in parallel, and we'll wait until they're
# done before we continue about our own business.
def wait_for_children
# Note that, unless you pass WNOHANG (see below), Process.wait will wait
# for the processes (of course).
true while((Process.wait rescue nil))
end
# This, like the above, will reap zombies. Every time we get a SIGCLD, we try
# to reap all of the zombies we have. Signal handlers are, as far as your
# program is concerned, asynchronous (although this is a simplification), and
# when a child process dies, a SIGCLD is sent to the parent. By default, this
# signal is ignored (see the signal(7) man page), but if we set up a handler
# for it, we can reap processes when they come back.
def do_not_wait_for_children
trap('CLD') {
true while((Process.wait(-1, Process::WNOHANG) rescue nil))
}
end
# Since Ruby threads are cheap and (mostly) non-blocking, you can do something
# like this instead of a plain fork(). It spawns a thread that spawns a
# process and waits to collect that process when it exits.
def autoreap_fork(&b)
Thread.new { Process.wait(fork(&b)) }
end
# Of course, there is another option, which is to do nothing. You don't want
# to fill the process table with zombies if you are, say, a long-running
# process, but as noted above, if the parent exits, its children are adopted by
# the init process, which will reap them when they become zombies. So if
# the output of these children is collected elsewhere later, you can just exit
# after spawning them and let your OS do the rest. That's the approach we take
# here: we spawn all of the children and forget about them. When this process
# dies, our children are handled by init.
# We do some expensive calculations here. Note that this will spawn several
# processes, and fib(n) can be expensive to calculate (with the algorithm we
# use) depending on your hardware. You may want to adjust the numbers here.
# To watch output as it arrives, you can run the following in another shell
# before you start this program:
# touch /tmp/fibonacci_numbers ; tail -f /tmp/fibonacci_numbers
(1..42).each { |n|
# Note that these numbers don't necessarily arrive in order! (Although, in
# our case, it is likely that they will, since fib(n+1) will take longer to
# calculate than fib(n).
fork {
# This will make the output of top/ps a little more friendly, if you
# want to watch the processes while they run.
$0 = "fib[#{n}]"
cpu_bound n, '/tmp/fibonacci_numbers'
}
}
puts "Running a few calculations in the 'background'."
%w(
http://ruby-doc.org/core/classes/Process.html
http://debu.gs/
http://reverso.be/
http://asdf.com/
http://bigempire.com/filthy
http://gist.github.com/
http://localhost/
http://code.google.com/p/termite/
).each { |url|
# For Ruby, threads are cheaper than processes, but since they are "green"
# threads (rather than OS-level threads; JRuby is an exception here, but
# JRuby threads aren't as cheap), CPU-bound processes don't really speed up
# when parallelized with threads. Threading can also lead to odd bugs if
# the threads touch any sort of shared resource: unlike processes created
# by forking, threads share memory, file descriptors, sockets, and
# other process-level resources. Using a Thread on a problem that is not
# CPU-bound (like fetching a website from the internet, which is
# I/O-bound), though, will let all of the I/O run almost in parallel while
# we wait. Of course, forking will work here, too.
Thread.new {
io_bound url, "/tmp/#{url.gsub(/[^a-z0-9\._]+/, '_')}.html"
}
}
puts "Downloading some web pages!"
# You can use Thread#join to wait for a process to finish. I'm going to do
# this the lazy way by, instead of keeping references to the Thread objects,
# just asking Ruby for all of the threads except the current one, and joining
# those. By the time we arrive here, some of them may already be done, and
# since we didn't keep a reference around, they could have already been GC'd
# and won't show up in Thread.list.
(Thread.list - [Thread.current]).each { |thread|
# We wrap it in an exception-eater as joining a thread will give you the
# block's return value. If the thread died mysteriously, though, the
# exception that killed it will bubble up here. Since we don't especially
# care what the thread did or even if it finished its mission, but we *do*
# want to wait until all of the threads are finished before we exit (an
# arbitrary restriction; we only care for the purposes of illustrating what
# to do when you care), we join all of them but ignore any mishaps they may
# have encountered.
begin
thread.join
rescue Exception
end
}
puts "Downloaded all of them! (Or the threads crashed, or any combination.)"
# At this point, the fibonacci processes may or may not have finished
# (depending on your CPU speed versus the speed of your net connection), but
# they're no longer our problem, since this program is over.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment