Last active
October 11, 2018 20:40
-
-
Save justinstoller/4aa80598c10d391f8aae97573be15087 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# to reliably reproduce run with: ruby repro.rb 10000 | |
require 'fcntl' | |
class Pump | |
attr_reader :string | |
def initialize(io) | |
@io = io | |
@thread = nil | |
@string = '' | |
@run = true | |
@min_delay = 0.05 | |
@max_delay = 1.0 | |
end | |
def start | |
@thread = Thread.new { pump } | |
end | |
# Block until the pumping thread reaches EOF on the IO object. | |
def wait | |
@thread.join | |
end | |
private | |
def pump | |
backoff = @min_delay | |
while @run | |
begin | |
@string << @io.read_nonblock(4096) | |
backoff /= 2 if backoff > @min_delay | |
rescue Errno::EWOULDBLOCK, Errno::EAGAIN | |
backoff *= 2 if backoff < @max_delay | |
IO.select([@io], [], [], backoff) | |
rescue EOFError | |
@run = false | |
end | |
end | |
end | |
end | |
class Runner | |
def pipe | |
::IO.pipe.tap do |pair| | |
pair.each {|p| p.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC) } | |
end | |
end | |
def run | |
stdout_r, stdout_w = pipe | |
stderr_r, stderr_w = pipe | |
exec_r, exec_w = pipe | |
stdout_pump = Pump.new(stdout_r) | |
stderr_pump = Pump.new(stderr_r) | |
stdout_pump.start | |
stderr_pump.start | |
pid = fork do | |
exec_r.close | |
begin | |
Process.setsid | |
# Reopen file descriptors | |
STDOUT.reopen(stdout_w) | |
STDERR.reopen(stderr_w) | |
executable = 'ls' | |
exec([executable, executable], '.') | |
rescue SystemCallError => e | |
exec_w.write("#{e.class}: #{e.message}") | |
exit(254) | |
end | |
end | |
exec_w.close | |
stdout_w.close | |
stderr_w.close | |
stdout = '' | |
stderr = '' | |
status = nil | |
if !exec_r.eof? | |
stderr = exec_r.read || "exec() failed" | |
_, status = Process.waitpid2(pid) | |
else | |
_, status = Process.waitpid2(pid) | |
stdout_pump.wait | |
stderr_pump.wait | |
stdout = stdout_pump.string | |
stderr = stderr_pump.string | |
end | |
exec_r.close | |
stdout_r.close | |
stderr_r.close | |
return status, stdout, stderr | |
end | |
end | |
iterations = ARGV[0].to_i | |
iterations.times do | |
Runner.new.run | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Notes/draft commit message:
Previously on Bionic we would see issues where the Ruby interpreter
would die with an "ASYNC BUG EPIPE" message and no meaningful
stacktrace. It appears that this comes from rb_async_bug_errno in Ruby's
error.c:608.
When attempting to strace, when registering listeners for various
syscalls with sysdig, or when attaching to a debug build of Ruby with
gdb this issue no longer presents itself.
This seems like a race condition and when it happens seems relatively
normally distributed form the high level of running a Ruby script to
reproduce. ie. a ruby script that mimics r10k's subcommand runner and
is repeatedly called in a loop will sometime fail immediately, and
sometimes not until tens of thousands of iterations but most often fails
in the 1k-10k range of iterations.
I did not capture system metrics but failure frequencies did not seem to
change depending on the usage of the VM itself or what else was going on
in our VM infrastructure.
When using Ruby 2.3.7 the issue seems less likely to happen. It seemed
to happen a similar amount with 2.6.0preview2. It would happen with
2.5.1 on Ubuntu 16.04 (Xenial, with kernel 4.4 vs Bionics 4.15 kernel).
I could not reproduce it on Redhat 7 (Ruby 2.5.1, Linux kernel
3.something).
I was able to isolate this issue down to a race condition when forking
during the while loop of R10K::Util::Subprocess::Pump#pump through
process of elimination.
This commit moves starting the pipe output pumping until after the
subprocess fork.