tmm1 · February 11, 2009 01:55
diff --git a/gistfile1.txt b/gistfile1.txt
 >  - In ruby 1.8.x, what is the functional difference between rb_thread_schedule and rb_thread_select?

 rb_thread_schedule() is the guts of the thread scheduler, it traverses
 over the linked list of threads (several times) to find the next one
 to switch into. The function is long (250 lines) and messy, and covers
 all the combinations of thread status (RUNNABLE, TO_KILL, STOPPED,
 KILLED) and wait state (FD, SELECT, TIME, JOIN, PID).

 If there are no threads doing i/o or waiting on a timeout,
 rb_thread_schedule() picks another thread from the list (considering
 thread priorities and states) and switches into it. Where there is
 i/o, it collects all the file descriptors associated with STOPPED
 threads that are WAIT_FD or WAIT_SELECT and runs select() on them. It
 also uses select() as a way to sleep, passing in the smallest timeout
 associated with any WAIT_TIME threads.

 In 1.8, rb_thread_schedule is called every 10 milliseconds. When you
 compile with --disable-pthread, ruby calls setitimer() as soon as an
 additional thread is created and kernel sends the process a SIGVTALRM
 every 10 milliseconds; ruby uses the signal handler to set
 rb_thread_pending = 1, which lets it know it needs to call
 rb_thread_schedule(). In the case where --enable-pthreads, a kernel
 thread is spawned which sits in a loop, nanosleeping for 10
 milliseconds and firing SIGVTALRM on the main thread.

 rb_thread_select() on the other hand, is simply the ruby version of
 select(). When you call select() from ruby, it invokes
 rb_thread_select(), which adds the file descriptors you passed in to
 the current running thread, and puts the thread in a WAIT_SELECT. Then
 it simply goes on to invoke rb_thread_schedule(), which will take
 those fds along with any other fds other threads care about and call
 select() on them all.

 Calling rb_thread_select() with no fds (like EM used to do in the
 epoll/kqueue case), is simply a roundabout way of calling
 rb_thread_schedule(). The function is more useful when you actually
 have a thread waiting on i/o, as is the case with mysqplus which calls
 rb_thread_select() on the mysql connection's file descriptor,
 effectively putting that thread in a WAIT_SELECT and letting other
 threads run until the query's results are available.

 >  - There is plenty of lore about rb_thread_select being very slow, any particular reason?

 The problem is that rb_thread_select() uses rb_thread_schedule(),
 which in turn uses select(). rb_thread_schedule() doesn't scale well
 when you have a lot of threads or a lot of file descriptors, since the
 function is invoked so often, has to traverse the list of threads
 constantly and repeatedly builds up big lists of file descriptors to
 pass into the kernel. Many of these problems are inherent to
 select().. there's a max of (usually) 1024 fds it can handle, and the
 performance gets worse as your increase the number of fds, or if you
 have a sparsely filled FDSET.

 >  - In ruby 1.9.x, is rb_thread_blocking_region is basically equivalent to rb_thread_select?

 Not really.. rb_thread_blocking_region is a way to run code outside
 the 1.9 GIL. The use case here is primarily for IO and external
 processes (popen). In EM for example, this is really useful because we
 can run the epoll/kqueue blocking system calls, but still allow other
 ruby threads to run at the same time.
	> - In ruby 1.8.x, what is the functional difference between rb_thread_schedule and rb_thread_select?

	rb_thread_schedule() is the guts of the thread scheduler, it traverses
	over the linked list of threads (several times) to find the next one
	to switch into. The function is long (250 lines) and messy, and covers
	all the combinations of thread status (RUNNABLE, TO_KILL, STOPPED,
	KILLED) and wait state (FD, SELECT, TIME, JOIN, PID).

	If there are no threads doing i/o or waiting on a timeout,
	rb_thread_schedule() picks another thread from the list (considering
	thread priorities and states) and switches into it. Where there is
	i/o, it collects all the file descriptors associated with STOPPED
	threads that are WAIT_FD or WAIT_SELECT and runs select() on them. It
	also uses select() as a way to sleep, passing in the smallest timeout
	associated with any WAIT_TIME threads.

	In 1.8, rb_thread_schedule is called every 10 milliseconds. When you
	compile with --disable-pthread, ruby calls setitimer() as soon as an
	additional thread is created and kernel sends the process a SIGVTALRM
	every 10 milliseconds; ruby uses the signal handler to set
	rb_thread_pending = 1, which lets it know it needs to call
	rb_thread_schedule(). In the case where --enable-pthreads, a kernel
	thread is spawned which sits in a loop, nanosleeping for 10
	milliseconds and firing SIGVTALRM on the main thread.

	rb_thread_select() on the other hand, is simply the ruby version of
	select(). When you call select() from ruby, it invokes
	rb_thread_select(), which adds the file descriptors you passed in to
	the current running thread, and puts the thread in a WAIT_SELECT. Then
	it simply goes on to invoke rb_thread_schedule(), which will take
	those fds along with any other fds other threads care about and call
	select() on them all.

	Calling rb_thread_select() with no fds (like EM used to do in the
	epoll/kqueue case), is simply a roundabout way of calling
	rb_thread_schedule(). The function is more useful when you actually
	have a thread waiting on i/o, as is the case with mysqplus which calls
	rb_thread_select() on the mysql connection's file descriptor,
	effectively putting that thread in a WAIT_SELECT and letting other
	threads run until the query's results are available.

	> - There is plenty of lore about rb_thread_select being very slow, any particular reason?

	The problem is that rb_thread_select() uses rb_thread_schedule(),
	which in turn uses select(). rb_thread_schedule() doesn't scale well
	when you have a lot of threads or a lot of file descriptors, since the
	function is invoked so often, has to traverse the list of threads
	constantly and repeatedly builds up big lists of file descriptors to
	pass into the kernel. Many of these problems are inherent to
	select().. there's a max of (usually) 1024 fds it can handle, and the
	performance gets worse as your increase the number of fds, or if you
	have a sparsely filled FDSET.

	> - In ruby 1.9.x, is rb_thread_blocking_region is basically equivalent to rb_thread_select?

	Not really.. rb_thread_blocking_region is a way to run code outside
	the 1.9 GIL. The use case here is primarily for IO and external
	processes (popen). In EM for example, this is really useful because we
	can run the epoll/kqueue blocking system calls, but still allow other
	ruby threads to run at the same time.