Created
April 8, 2013 08:07
-
-
Save julik/5335072 to your computer and use it in GitHub Desktop.
Le fork and forget picked up on Pastebin
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#! /usr/bin/env ruby | |
# "Fork and Forget" | |
# Don't wait if you don't have to: A mini-tutorial about concurrency | |
# mechanisms in Ruby and basic Unix systems programming, and how you can use | |
# them to avoid waiting. | |
# | |
# I have heard that people are occasionally unfamiliar with this strategy. | |
# It's a common idiom, regardless of language, and it is also essentially built | |
# into Erlang (and Termite Scheme, etc.). | |
# | |
# If you have a thing that takes forever and your program doesn't care so much | |
# about its output (or prefers to collect it later through some other means, | |
# like a pipe/file/DB/etc.), then this is probably the thing you want to do, | |
# rather than stopping execution to wait. | |
# | |
# The relevant methods (fork, wait, waitpid, etc.) are essentially just | |
# wrappers around the standard Unix system calls of the same name, and have | |
# almost the same semantics, although they're often nicer to use in Ruby than | |
# in C. | |
# | |
# Ruby: | |
# Process.waitpid(fork { | |
# do_a_thing! | |
# }) | |
# | |
# C: | |
# if(pid = fork()) { | |
# do_a_thing(); | |
# } else { | |
# waitpid(pid, NULL, 0); | |
# } | |
# | |
# And so, without further ado, let's get started. | |
require 'open-uri' | |
# A quick utility. | |
class File | |
def self.append fn, str | |
(open(fn, 'a') << str).close | |
end | |
end | |
# Fibonacci. The naive, recursive version is great for pegging the CPU and | |
# simple to implement. | |
def fib n | |
return 1 if n < 2 | |
fib(n - 1) + fib(n - 2) | |
end | |
# We stuff the results into a file. IO.popen is another good way to grab the | |
# results if the current process wants to handle the data itself. | |
# Contrived. | |
def io_bound url, output | |
File.append output, open(url).read | |
end | |
# Contrived, way contrived. | |
def cpu_bound n, output | |
File.append output, "#{fib(n)}\n" | |
end | |
# Now we get to the interesting bits about Unix processes, Ruby threads, and | |
# concurrency. | |
# Where do processes go when they die? They become zombies. (Unix terminology | |
# is best terminology.) The status will show up as a "Z" in the output of | |
# top(1) or ps(1). The main reason they hang out like this is to allow the | |
# parent access to their status on exit. If the parent exits before they do, | |
# though, the processes become children of the init(8) process. init | |
# automatically reaps all of its children that become zombies. | |
# This is a fairly simple loop that waits to reap all of the child processes | |
# that have exited. They'll all run in parallel, and we'll wait until they're | |
# done before we continue about our own business. | |
def wait_for_children | |
# Note that, unless you pass WNOHANG (see below), Process.wait will wait | |
# for the processes (of course). | |
true while((Process.wait rescue nil)) | |
end | |
# This, like the above, will reap zombies. Every time we get a SIGCLD, we try | |
# to reap all of the zombies we have. Signal handlers are, as far as your | |
# program is concerned, asynchronous (although this is a simplification), and | |
# when a child process dies, a SIGCLD is sent to the parent. By default, this | |
# signal is ignored (see the signal(7) man page), but if we set up a handler | |
# for it, we can reap processes when they come back. | |
def do_not_wait_for_children | |
trap('CLD') { | |
true while((Process.wait(-1, Process::WNOHANG) rescue nil)) | |
} | |
end | |
# Since Ruby threads are cheap and (mostly) non-blocking, you can do something | |
# like this instead of a plain fork(). It spawns a thread that spawns a | |
# process and waits to collect that process when it exits. | |
def autoreap_fork(&b) | |
Thread.new { Process.wait(fork(&b)) } | |
end | |
# Of course, there is another option, which is to do nothing. You don't want | |
# to fill the process table with zombies if you are, say, a long-running | |
# process, but as noted above, if the parent exits, its children are adopted by | |
# the init process, which will reap them when they become zombies. So if | |
# the output of these children is collected elsewhere later, you can just exit | |
# after spawning them and let your OS do the rest. That's the approach we take | |
# here: we spawn all of the children and forget about them. When this process | |
# dies, our children are handled by init. | |
# We do some expensive calculations here. Note that this will spawn several | |
# processes, and fib(n) can be expensive to calculate (with the algorithm we | |
# use) depending on your hardware. You may want to adjust the numbers here. | |
# To watch output as it arrives, you can run the following in another shell | |
# before you start this program: | |
# touch /tmp/fibonacci_numbers ; tail -f /tmp/fibonacci_numbers | |
(1..42).each { |n| | |
# Note that these numbers don't necessarily arrive in order! (Although, in | |
# our case, it is likely that they will, since fib(n+1) will take longer to | |
# calculate than fib(n). | |
fork { | |
# This will make the output of top/ps a little more friendly, if you | |
# want to watch the processes while they run. | |
$0 = "fib[#{n}]" | |
cpu_bound n, '/tmp/fibonacci_numbers' | |
} | |
} | |
puts "Running a few calculations in the 'background'." | |
%w( | |
http://ruby-doc.org/core/classes/Process.html | |
http://debu.gs/ | |
http://reverso.be/ | |
http://asdf.com/ | |
http://bigempire.com/filthy | |
http://gist.github.com/ | |
http://localhost/ | |
http://code.google.com/p/termite/ | |
).each { |url| | |
# For Ruby, threads are cheaper than processes, but since they are "green" | |
# threads (rather than OS-level threads; JRuby is an exception here, but | |
# JRuby threads aren't as cheap), CPU-bound processes don't really speed up | |
# when parallelized with threads. Threading can also lead to odd bugs if | |
# the threads touch any sort of shared resource: unlike processes created | |
# by forking, threads share memory, file descriptors, sockets, and | |
# other process-level resources. Using a Thread on a problem that is not | |
# CPU-bound (like fetching a website from the internet, which is | |
# I/O-bound), though, will let all of the I/O run almost in parallel while | |
# we wait. Of course, forking will work here, too. | |
Thread.new { | |
io_bound url, "/tmp/#{url.gsub(/[^a-z0-9\._]+/, '_')}.html" | |
} | |
} | |
puts "Downloading some web pages!" | |
# You can use Thread#join to wait for a process to finish. I'm going to do | |
# this the lazy way by, instead of keeping references to the Thread objects, | |
# just asking Ruby for all of the threads except the current one, and joining | |
# those. By the time we arrive here, some of them may already be done, and | |
# since we didn't keep a reference around, they could have already been GC'd | |
# and won't show up in Thread.list. | |
(Thread.list - [Thread.current]).each { |thread| | |
# We wrap it in an exception-eater as joining a thread will give you the | |
# block's return value. If the thread died mysteriously, though, the | |
# exception that killed it will bubble up here. Since we don't especially | |
# care what the thread did or even if it finished its mission, but we *do* | |
# want to wait until all of the threads are finished before we exit (an | |
# arbitrary restriction; we only care for the purposes of illustrating what | |
# to do when you care), we join all of them but ignore any mishaps they may | |
# have encountered. | |
begin | |
thread.join | |
rescue Exception | |
end | |
} | |
puts "Downloaded all of them! (Or the threads crashed, or any combination.)" | |
# At this point, the fibonacci processes may or may not have finished | |
# (depending on your CPU speed versus the speed of your net connection), but | |
# they're no longer our problem, since this program is over. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment