Don't trust these tests and results for life or death situations, cause I can't guarantee it's very scientific. The example data I'm using is what I consider could be your average user data, but possibly it's not structured in a optimal way for these tests.
If you don't know what BSON is, it's the raw binary format that MongoDB uses internally to store all it's data. It has some interesting advantages over JSON.
I've created four tests, each to time how long JSON and BSON each take to generate their format from a Ruby Hash, and how long time they take to parse their data back into a Ruby Hash.
Using Ruby 1.8.7, BSON and JSON are roughly just as fast with a dataset of 100 rows or less when generating their respective formats. As for parsing/reading back to a Ruby Hash, BSON is faster than JSON (0.003 seconds vs. 0.01). With a dataset of 10,000 rows on the other hand, BSON is taking on average 10 seconds to read BSON data to a Ruby Hash, while JSON takes 0.2-0.5 seconds, writing this data though BSON does slightly faster.
With Ruby 1.9.1 however, the story looks different. BSON is about five times faster than JSON at generating their respective formats with large (10,000 rows in 0.3 secongs) and small datasets. Parsing back to a Ruby Hash though, BSON is about twice as slow as JSON with large datasets.
One oddity with Ruby 1.9.x however is that when the data.rb
file contains 10,000 rows (2.3MB), requiring the file takes almost a minute, while with Ruby 1.8 it's instant.
In short, BSON is terribly slow to read compared to JSON, but faster to build. Also, the BSON data seems to be slightly larger in byte size than JSON data.
git clone git://gist.github.com/263161.git json_vs_bson_gist
You will need to install the mongo
and mongo_ext
to get the BSON module with optimal performance.
sudo gem install mongo mongo_ext
To run the tests, you will for need to generate the data.rb
file that the tests use:
./make_data 1000
This will generate a dataset with 1000 "rows" in data.rb.
Then to run all the tests from the tests
folder run:
./run_tests
The time of each test along with the size of the used dataset is displayed, and logged to results.txt
.
When I first pushed this gist, I had an mistake in my tests and findings. Namely, I had managed to flip around the two BSON tests, so what I thought was the read test, was actually the make/write test, and vise verse. I've corrected it, and updated everything accordingly.
Alas, as the idiom goes, "The devil is in the details." To pull in the BSON C extension, make sure that you require "bson_ext". There's a huge difference between the pure Ruby implementation and the C extension. Even with the C extensions, the 10gen supported Ruby driver is not what it could be. Analysis of the C extension shows some inefficiencies . For serialization to BSON, there are some extraneous malloc's (rb_str_new2 for symbol, rb_ary_* for extra key passing) that should ideally be eliminated but that are painfully difficult to extract from the code as currently written. BSON deserialization is dominated by Ruby object creation which includes malloc's, we certainly could benefit from optimization. It's worth taking a look at Moped to see how Ruby meta-programming can be used to simplify serialization/deserialization (have classes/objects operate on themselves). In all cases, whether JSON or BSON, extra object creation overhead and malloc's are expensive and will probably dominate over other costs.