-
-
Save coffeemug/7942219 to your computer and use it in GitHub Desktop.
David, sorry you ran into these issues. The crash is a known bug, and will be fixed in the 1.12 release (along with the new cache implementation). See https://github.com/rethinkdb/rethinkdb/issues/1389 for more details. | |
As to the performance you're getting, could you give some additional info so we could track this down? | |
- Which OS/version are you running? | |
- Which RethinkDB version are you running? (you can tell by running `rethinkdb --version`) | |
- Which client driver are you using (and which version)? | |
- What query do you use to get out 100 documents and how are you measuing latency? | |
Would really appreciate your feedback so we could fix these issues. |
Additionally, when I am executing 4 of the above read queries per second, here is the output of vmstat @ 1s polling. Note that there is almost zero I/O load ( which is to be expected with a data set this small ), but significant user and system cpu. The first two and last two lines provided are vmstat reads BEFORE and AFTER the queries were being executed, for reference( 100% idle, 5x less context-switching )
0 0 0 1743360 28248 54244 0 0 0 0 41 72 0 0 100 0
0 0 0 1743360 28248 54244 0 0 0 0 44 75 0 0 100 0
0 0 0 1743424 28248 54244 0 0 0 0 218 165 5 2 93 0
0 0 0 1743296 28248 54244 0 0 0 0 283 221 10 2 88 0
1 0 0 1743424 28248 54244 0 0 0 0 279 203 9 2 89 0
0 0 0 1743360 28248 54244 0 0 0 0 274 191 9 2 89 0
0 0 0 1743300 28248 54244 0 0 0 0 311 199 10 2 89 0
0 0 0 1743116 28248 54248 0 0 0 0 435 315 13 4 83 0
0 0 0 1743148 28248 54248 0 0 0 0 298 218 11 1 88 0
0 0 0 1743148 28248 54248 0 0 0 0 255 196 9 3 89 0
0 0 0 1743148 28248 54248 0 0 0 0 301 207 11 2 87 0
0 0 0 1743116 28248 54248 0 0 0 0 306 220 11 2 87 0
0 0 0 1743212 28256 54240 0 0 0 28 265 193 8 2 90 0
0 0 0 1743212 28256 54248 0 0 0 0 46 80 0 0 100 0
0 0 0 1743212 28256 54248 0 0 0 0 43 67 0 0 100 0
We're set up a public test instance here: http://rdbtest.ties.com:8080/#dataexplorer
Running the following with profiling turned on shows the server time anywhere between 25ms and 35ms. This is nothing more than reading out 500 documents of 493 bytes each.
30ms to scan and move 246kb works out to be 8.2 MBps of memory bus bandwidth and 1120 instructions per byte ( this host has a bogomips of 4602 ). Correct me if my math is off, here.
Now obviously this doesn't take into account context switches, pipeline delays, cache misses, Amdahl's law, resource starvation by VM guests, or any other various OS esoterica, but even if we assume 90% overhead, this is still nearly two orders of magnitude slower than it ought to be.
I should like to reiterate that what we're seeing appears to be largely a CPU issue, not an I/O issue. I don't suppose you know of any way to inspect exactly what's going on inside the rethinkdb instance during the query?
Also of note is the fact that the vast vast majority of the time reported by the profiler output is spent in "Do range scan on primary index" - which suggests it's not even spending a lot of time reading the data so much as going through the index, which should only have 500 uuids.
Ok, one more update. We've installed another instance of rethink on a bare metal server with 8 xeon cores and 16GB ram. We also run Ubuntu 12.04 LTS on this host, rather than Arch. When running tests, we shut down everything else such that it was otherwise idle. Despite the greater resources and lack of virtualization, we are seeing the same behavior.
The ONLY difference we could see on the bare metal server is far more consistency in task duration for the parallel read operations. This is to be expected, but the total runtime and CPU load is not significantly different from the virtual guest environment of before.
We have also tested on a homebrew-built instance of the server running natively on a macbook pro with 16gb ram and otherwise idle, with similar results. It does not appear to make any difference what operating environment rethink is running on - the read performance is still abysmal.
The event "Do range scan on primary index" includes the time needed to read the values off of disk. Do you know if any IO does happen while the query is being performed? Or is all of the data in memory?
@jdoliner yeah check the vmstat I posted above. It's all cpu.
@wildattire -- thank you for submitting the detailed report, it helps immensely. Let's keep track of this in rethinkdb/rethinkdb#1766. We'll try to get to this as soon as we can.
I have some additional info. We don't need to give out our data in order to replicate this. It's easily reproducible with simple dummy data.
r.db('test').table('test').insert( { foo: 'lol', bar: [ { baz: 'wtf' }, { baz: 'the rain in spain falls mainly on the plain' } ], wut: 3.14159, children: [ { foo: 'lol', bar: [ { baz: 'wat' }, { baz: 'lorem ipsum blah blah blah' } ], wut: 2.71828, children: [ { } ] }, { foo: 'lol', bar: [ { baz: 'wat' }, { baz: 'mary had a little lamb whose fleece was white as snow' } ], wut: 0.0000001, children: [ { } ] } ] } )
[
{
"description": "Evaluating table." ,
"duration(ms)": 0.126713 ,
"sub_tasks": [
{
"description": "Evaluating db." ,
"duration(ms)": 0.095728 ,
"sub_tasks": [
{
"description": "Evaluating datum." ,
"duration(ms)": 0.000491 ,
"sub_tasks": [ ]
}
]
} ,
{
"description": "Evaluating datum." ,
"duration(ms)": 0.00016 ,
"sub_tasks": [ ]
}
]
} ,
{
"description": "Perform read." ,
"duration(ms)": 29.232632 ,
"sub_tasks": [
{
"parallel_tasks": [
[
{
"description": "Perform read on shard." ,
"duration(ms)": 17.953453 ,
"sub_tasks": [
{
"description": "Do range scan on primary index." ,
"duration(ms)": 17.947923 ,
"sub_tasks": [ ]
}
]
}
] ,
[
{
"description": "Perform read on shard." ,
"duration(ms)": 17.948993 ,
"sub_tasks": [
{
"description": "Do range scan on primary index." ,
"duration(ms)": 17.947555 ,
"sub_tasks": [ ]
}
]
}
] ,
[
{
"description": "Perform read on shard." ,
"duration(ms)": 8.303449 ,
"sub_tasks": [
{
"description": "Do range scan on primary index." ,
"duration(ms)": 8.280298 ,
"sub_tasks": [ ]
}
]
}
] ,
[
{
"description": "Perform read on shard." ,
"duration(ms)": 5.107515 ,
"sub_tasks": [
{
"description": "Do range scan on primary index." ,
"duration(ms)": 5.104562 ,
"sub_tasks": [ ]
}
]
}
]
]
}
]
}
]