The original-insert.php
script creates 2,000 documents with the following schema:
{
"_id" : ObjectId("52b3815de84df102428b4567"),
"appver" : NumberLong(123456),
"deploytype" : "production",
"g" : NumberLong(123456),
"guid" : "e91b59a9-eccd-5cbc-bfee-724e4394c842",
"timeslice" : {
"y" : NumberLong(13),
"mo" : NumberLong(12),
"d" : NumberLong(19),
"h" : NumberLong(18),
"m" : NumberLong(29)
},
"random" : "52b382730663e",
"stats" : {
"eventnames" : {
"baz-baz-baz-baz-baz-baz-0" : NumberLong(0)
}
"events": NumberLong(0)
}
}
The stats.eventnames
object has 20,000 keys, which share one of three prefixes. The sum of their values is stored in stats.events
.
For the update test (original-update.php
), we find a document by appver
, deploytype
, g
, guid
, and timeslice
. For the atomic update, we $set
the random
field to a new string and $inc
five random keys in stats.eventnames
and finally $inc
stats.events
by 5. This test executes 10,000 updates (each of the 2,000 documents is processed 5 times).
This benchmark is meant to stress MongoDB's walking of BSON objects with many keys at the same level. The fact that those keys share a common prefix is intentional to make the string comparisons for each key in the object more expensive (i.e. they cannot short-circuit as quickly).
The bucket-insert.php
script creates 20,000 documents with the following schema:
{
"_id" : ObjectId("52b3815de84df102428b4567"),
"appver" : NumberLong(123456),
"deploytype" : "production",
"g" : NumberLong(123456),
"guid" : "e91b59a9-eccd-5cbc-bfee-724e4394c842",
"timeslice" : {
"y" : NumberLong(13),
"mo" : NumberLong(12),
"d" : NumberLong(19),
"h" : NumberLong(18),
"m" : NumberLong(29)
},
"random" : "52b382730663e",
"stats" : {
"eventnames" : {
"15" : {
"baz-baz-baz-baz-baz-baz-0" : NumberLong(0)
}
}
"events": NumberLong(0)
}
}
Unlike the original schema, keys that would appear in the stats.eventnames
object are bucketed by the first two characters of their MD5 hash. The sum of the final values is still stored in stats.events
.
The update test (bucket-update.php
) performs the same updates as the original script, but the $inc
operations target events by their bucket path. This test executes 10,000 updates (each of the 2,000 documents is processed 5 times).
This benchmark is meant to measure an improvement by avoiding many string comparisons while iterating through BSON object keys. Conceivably, multiple levels of bucketing could be implemented (up to a point of diminishing returns).
While inserts are rather inefficient (PHP creates the documents on-demand and does not use batch insertion), the query/newObj arguments for all update operations are generated in advance so that update operations can be spammed to MongoDB. A write concern of zero is also used to ensure PHP does not wait for a response from MongoDB before issuing the next update operation.
$ time php original-update.php
Updated 10000 documents in 18.351067 seconds (avg: 0.001835/document)
real 0m33.740s
user 0m12.820s
sys 0m0.428s
And mongostat
output during the benchmark:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
*0 *0 *0 *0 21 1|0 0 8.03g 16.3g 7.38g 0 test:0.0% 0 0|0 0|0 1k 96m 3 18:18:14
*0 *0 373 *0 7 3|0 0 8.03g 16.3g 7.41g 0 test:66.9% 0 0|0 0|1 211k 27m 3 18:18:15
*0 *0 514 *0 0 1|0 0 8.03g 16.3g 6.64g 0 test:98.3% 0 0|0 0|1 292k 2k 3 18:18:16
*0 *0 532 *0 0 1|0 0 8.03g 16.3g 6.96g 0 test:98.0% 0 0|0 0|1 302k 2k 3 18:18:17
*0 *0 527 *0 0 1|0 0 8.03g 16.3g 6.33g 0 test:98.5% 0 0|0 0|1 299k 2k 3 18:18:18
*0 *0 522 *0 0 1|0 0 8.03g 16.3g 6.25g 0 test:98.2% 0 0|0 0|1 296k 2k 3 18:18:19
*0 *0 508 *0 0 1|0 0 8.03g 16.3g 6.19g 0 test:98.4% 0 0|0 0|1 288k 2k 3 18:18:20
*0 *0 534 *0 0 1|0 0 8.03g 16.3g 6.5g 0 test:98.2% 0 0|0 0|1 303k 2k 3 18:18:21
*0 *0 520 *0 0 1|0 0 8.03g 16.3g 6.29g 0 test:98.3% 0 0|0 0|1 295k 2k 3 18:18:22
*0 *0 526 *0 0 1|0 0 8.03g 16.3g 6.31g 0 test:98.4% 0 0|0 0|1 298k 2k 3 18:18:23
*0 *0 503 *0 0 2|0 0 8.03g 16.3g 6.12g 0 test:98.2% 0 0|0 0|1 285k 2k 3 18:18:24
*0 *0 532 *0 0 1|0 0 8.03g 16.3g 6.44g 0 test:98.2% 0 0|0 0|1 302k 2k 3 18:18:25
*0 *0 524 *0 0 1|0 0 8.03g 16.3g 6.34g 0 test:98.6% 0 0|0 0|1 297k 2k 3 18:18:26
*0 *0 522 *0 0 1|0 0 8.03g 16.3g 6.34g 0 test:98.1% 0 0|0 0|1 296k 2k 3 18:18:27
*0 *0 513 *0 0 1|0 0 8.03g 16.3g 6.15g 0 test:98.4% 0 0|0 0|1 291k 2k 3 18:18:28
*0 *0 520 *0 0 1|0 0 8.03g 16.3g 6.34g 0 test:98.3% 0 0|0 0|1 295k 2k 3 18:18:29
*0 *0 524 *0 0 1|0 0 8.03g 16.3g 6.3g 0 test:98.3% 0 0|0 0|1 297k 2k 3 18:18:30
*0 *0 525 *0 0 1|0 0 8.03g 16.3g 6.24g 0 test:98.3% 0 0|0 0|1 298k 2k 3 18:18:31
*0 *0 472 *0 0 1|0 0 8.03g 16.3g 5.98g 0 test:90.8% 0 0|1 0|1 268k 2k 3 18:18:32
*0 *0 519 *0 0 2|0 0 8.03g 16.3g 6.29g 0 test:98.1% 0 0|0 0|1 295k 2k 3 18:18:33
*0 *0 290 *0 0 1|0 0 8.03g 16.3g 6.27g 0 test:56.5% 0 0|0 0|0 165k 2k 2 18:18:34
*0 *0 *0 *0 0 1|0 0 8.03g 16.3g 5.98g 0 .:1.3% 0 0|0 0|0 62b 2k 2 18:18:35
$ time php bucket-update.php
Updated 10000 documents in 1.469424 seconds (avg: 0.000147/document)
real 0m15.805s
user 0m11.916s
sys 0m0.508s
And mongostat
output during the benchmark:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
*0 *0 *0 *0 22 1|0 0 8.03g 16.3g 7.41g 0 test:0.1% 0 0|0 0|0 1k 101m 3 18:20:16
*0 *0 4210 *0 2 3|0 0 8.03g 16.3g 7.42g 0 test:71.6% 0 0|0 0|0 2m 4m 3 18:20:17
*0 *0 3542 *0 0 1|0 0 8.03g 16.3g 7.42g 0 test:87.8% 0 0|0 0|1 2m 2k 3 18:20:18
*0 *0 2248 *0 0 1|0 0 8.03g 16.3g 6.29g 0 test:46.4% 0 0|0 0|0 1m 2k 2 18:20:19
*0 *0 *0 *0 0 1|0 0 8.03g 16.3g 6.26g 0 .:0.2% 0 0|0 0|0 62b 2k 2 18:20:20