This is a proof-of-concept of a couple of concurrent data structures written in Ruby.
The implementations are heavily commented for those interested. There are benchmarks (with results) included below. The results are interesting, but, as always, take with a grain of salt.
AtomicLinkedQueue
is a lock-free queue, built on atomic CAS operations.
It doesn't use any mutexes.
TwoLockLinkedQueue
is a lock-based queue, but with separate locks for the
head + tail. This means there can be lots of contention when pushing to
the queue, but little when it comes to popping off the queue.
Both of these implementations are unbounded queues, with no blocking operations.
See the individual files below for more about their implementation.
For those unfamiliar with the atomic compare-and-swap operation, here's a brief outline of how it operates.
Create an Atomic instance that holds a value. (This comes from the atomic rubygem).
item = Atomic.new('value')
The workhorse method is #compare_and_set(old, new). You pass it the value that you think it currently holds, and the value you want to set it to.
If it does hold the expected value, it's set to your new value. Otherwise, it returns false. In this implementation, when that happens, the operation is re-started, over and over, until it succeeds.
This compare-and-set operation is hardware-supported and works across Ruby implementations thanks to the 'atomic' gem. The hardware support provides the atomic guarantee. Without this guarantee, it would be possible to read a value, then have another thread update the value before you can, then your thread would overwrite that value. This stuff is complicated.
I can't take credit for this implementation. This is an implementation of the pseudo-code from "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms"[1], with some hints from Java's implementation of java.util.concurrent.ConcurrentLinkedQueue[2].
- http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html
- http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/concurrent/ConcurrentLinkedQueue.java
These are the results of running the included benchmarks against Rubinius 2.0.0-rc1 and JRuby 1.7.3. Any results from MRI will be superficial due to the global lock, so I omitted it completely.
All of the benchmarks were run against Queue
(a fully synchronized queue from Ruby's std lib), TwoLockLinkedQueue
(implementation below using two different locks), and AtomicLinkedQueue
(the lock-free implementation).
The benchmark_concurrent_reads+writes.rb
benchmark allocates some threads to write to the queue, and others to read. So there's always contention for both pushing and popping.
The benchmark_separate_reads+writes.rb
benchmark first focuses on filling up the queue to capacity, then emptying it completely. This focuses all of the contention on writing, then all of it on reading.
$ rbx benchmark_concurrent_reads+writes.rb
user system total real
Queue - more writers than readers 4.178510 4.892144 9.070654 ( 6.677201)
Queue - more readers than writers 5.427958 4.914869 10.342827 ( 7.545760)
Queue - equal writers and readers 5.313148 6.802285 12.115433 ( 8.720809)
user system total real
TwoLockLinkedQueue - more writers than readers 5.151256 7.610410 12.761666 ( 6.458769)
TwoLockLinkedQueue - more readers than writers 5.395152 8.326897 13.722049 ( 6.568123)
TwoLockLinkedQueue - equal writers and readers 6.641767 10.623600 17.265367 ( 7.297145)
user system total real
AtomicLinkedQueue - more writers than readers 2.897964 0.061956 2.959920 ( 0.717638)
AtomicLinkedQueue - more readers than writers 2.814892 0.050590 2.865482 ( 0.596547)
AtomicLinkedQueue - equal writers and readers 4.175097 0.086688 4.261785 ( 0.891113)
user system total real
$ rbx benchmark_separate_reads+writes.rb
user system total real
Queue - fill, then empty - 10k 0.113160 0.120949 0.234109 ( 0.159911)
Queue - fill, then empty - 100k 1.035808 1.174422 2.210230 ( 1.596514)
Queue - fill, then empty - 1mm 11.258097 12.224407 23.482504 ( 17.325185)
user system total real
TwoLockLinkedQueue - fill, then empty - 10k 0.139143 0.172324 0.311467 ( 0.214725)
TwoLockLinkedQueue - fill, then empty - 100k 1.312984 1.671349 2.984333 ( 2.233421)
TwoLockLinkedQueue - fill, then empty - 1mm 12.175179 16.069279 28.244458 ( 22.541654)
user system total real
AtomicLinkedQueue - fill, then empty - 10k 0.071836 0.003300 0.075136 ( 0.009811)
AtomicLinkedQueue - fill, then empty - 100k 0.645546 0.011743 0.657289 ( 0.147805)
AtomicLinkedQueue - fill, then empty - 1mm 7.075495 0.108397 7.183892 ( 1.663006)
$ jruby benchmark_concurrent_reads+writes.rb
user system total real
Queue - more writers than readers 0.224000 0.000000 0.224000 ( 0.224000)
Queue - more readers than writers 8.529000 0.000000 8.529000 ( 8.529000)
Queue - equal writers and readers 0.263000 0.000000 0.263000 ( 0.262000)
user system total real
TwoLockLinkedQueue - more writers than readers 1.492000 0.000000 1.492000 ( 1.492000)
TwoLockLinkedQueue - more readers than writers 1.788000 0.000000 1.788000 ( 1.788000)
TwoLockLinkedQueue - equal writers and readers 2.205000 0.000000 2.205000 ( 2.205000)
user system total real
AtomicLinkedQueue - more writers than readers 1.086000 0.000000 1.086000 ( 1.086000)
AtomicLinkedQueue - more readers than writers 0.571000 0.000000 0.571000 ( 0.572000)
AtomicLinkedQueue - equal writers and readers 1.049000 0.000000 1.049000 ( 1.049000)
$ jruby benchmark_separate_reads+writes.rb
user system total real
Queue - fill, then empty - 10k 0.014000 0.000000 0.014000 ( 0.014000)
Queue - fill, then empty - 100k 0.065000 0.000000 0.065000 ( 0.065000)
Queue - fill, then empty - 1mm 0.744000 0.000000 0.744000 ( 0.744000)
user system total real
TwoLockLinkedQueue - fill, then empty - 10k 0.032000 0.000000 0.032000 ( 0.032000)
TwoLockLinkedQueue - fill, then empty - 100k 0.337000 0.000000 0.337000 ( 0.337000)
TwoLockLinkedQueue - fill, then empty - 1mm 4.640000 0.000000 4.640000 ( 4.640000)
user system total real
AtomicLinkedQueue - fill, then empty - 10k 0.016000 0.000000 0.016000 ( 0.016000)
AtomicLinkedQueue - fill, then empty - 100k 0.162000 0.000000 0.162000 ( 0.162000)
AtomicLinkedQueue - fill, then empty - 1mm 2.706000 0.000000 2.706000 ( 2.706000)