Created
February 11, 2009 01:55
-
-
Save tmm1/61762 to your computer and use it in GitHub Desktop.
FAQ about MRI internals
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> - In ruby 1.8.x, what is the functional difference between rb_thread_schedule and rb_thread_select? | |
rb_thread_schedule() is the guts of the thread scheduler, it traverses | |
over the linked list of threads (several times) to find the next one | |
to switch into. The function is long (250 lines) and messy, and covers | |
all the combinations of thread status (RUNNABLE, TO_KILL, STOPPED, | |
KILLED) and wait state (FD, SELECT, TIME, JOIN, PID). | |
If there are no threads doing i/o or waiting on a timeout, | |
rb_thread_schedule() picks another thread from the list (considering | |
thread priorities and states) and switches into it. Where there is | |
i/o, it collects all the file descriptors associated with STOPPED | |
threads that are WAIT_FD or WAIT_SELECT and runs select() on them. It | |
also uses select() as a way to sleep, passing in the smallest timeout | |
associated with any WAIT_TIME threads. | |
In 1.8, rb_thread_schedule is called every 10 milliseconds. When you | |
compile with --disable-pthread, ruby calls setitimer() as soon as an | |
additional thread is created and kernel sends the process a SIGVTALRM | |
every 10 milliseconds; ruby uses the signal handler to set | |
rb_thread_pending = 1, which lets it know it needs to call | |
rb_thread_schedule(). In the case where --enable-pthreads, a kernel | |
thread is spawned which sits in a loop, nanosleeping for 10 | |
milliseconds and firing SIGVTALRM on the main thread. | |
rb_thread_select() on the other hand, is simply the ruby version of | |
select(). When you call select() from ruby, it invokes | |
rb_thread_select(), which adds the file descriptors you passed in to | |
the current running thread, and puts the thread in a WAIT_SELECT. Then | |
it simply goes on to invoke rb_thread_schedule(), which will take | |
those fds along with any other fds other threads care about and call | |
select() on them all. | |
Calling rb_thread_select() with no fds (like EM used to do in the | |
epoll/kqueue case), is simply a roundabout way of calling | |
rb_thread_schedule(). The function is more useful when you actually | |
have a thread waiting on i/o, as is the case with mysqplus which calls | |
rb_thread_select() on the mysql connection's file descriptor, | |
effectively putting that thread in a WAIT_SELECT and letting other | |
threads run until the query's results are available. | |
> - There is plenty of lore about rb_thread_select being very slow, any particular reason? | |
The problem is that rb_thread_select() uses rb_thread_schedule(), | |
which in turn uses select(). rb_thread_schedule() doesn't scale well | |
when you have a lot of threads or a lot of file descriptors, since the | |
function is invoked so often, has to traverse the list of threads | |
constantly and repeatedly builds up big lists of file descriptors to | |
pass into the kernel. Many of these problems are inherent to | |
select().. there's a max of (usually) 1024 fds it can handle, and the | |
performance gets worse as your increase the number of fds, or if you | |
have a sparsely filled FDSET. | |
> - In ruby 1.9.x, is rb_thread_blocking_region is basically equivalent to rb_thread_select? | |
Not really.. rb_thread_blocking_region is a way to run code outside | |
the 1.9 GIL. The use case here is primarily for IO and external | |
processes (popen). In EM for example, this is really useful because we | |
can run the epoll/kqueue blocking system calls, but still allow other | |
ruby threads to run at the same time. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment