Skip to content

Instantly share code, notes, and snippets.

@coffeemug
Created December 13, 2013 10:01
Show Gist options
  • Save coffeemug/7942219 to your computer and use it in GitHub Desktop.
Save coffeemug/7942219 to your computer and use it in GitHub Desktop.
David, sorry you ran into these issues. The crash is a known bug, and will be fixed in the 1.12 release (along with the new cache implementation). See https://github.com/rethinkdb/rethinkdb/issues/1389 for more details.
As to the performance you're getting, could you give some additional info so we could track this down?
- Which OS/version are you running?
- Which RethinkDB version are you running? (you can tell by running `rethinkdb --version`)
- Which client driver are you using (and which version)?
- What query do you use to get out 100 documents and how are you measuing latency?
Would really appreciate your feedback so we could fix these issues.
@wildattire
Copy link

We're set up a public test instance here: http://rdbtest.ties.com:8080/#dataexplorer

Running the following with profiling turned on shows the server time anywhere between 25ms and 35ms. This is nothing more than reading out 500 documents of 493 bytes each.

30ms to scan and move 246kb works out to be 8.2 MBps of memory bus bandwidth and 1120 instructions per byte ( this host has a bogomips of 4602 ). Correct me if my math is off, here.

Now obviously this doesn't take into account context switches, pipeline delays, cache misses, Amdahl's law, resource starvation by VM guests, or any other various OS esoterica, but even if we assume 90% overhead, this is still nearly two orders of magnitude slower than it ought to be.

I should like to reiterate that what we're seeing appears to be largely a CPU issue, not an I/O issue. I don't suppose you know of any way to inspect exactly what's going on inside the rethinkdb instance during the query?

Also of note is the fact that the vast vast majority of the time reported by the profiler output is spent in "Do range scan on primary index" - which suggests it's not even spending a lot of time reading the data so much as going through the index, which should only have 500 uuids.

@wildattire
Copy link

Ok, one more update. We've installed another instance of rethink on a bare metal server with 8 xeon cores and 16GB ram. We also run Ubuntu 12.04 LTS on this host, rather than Arch. When running tests, we shut down everything else such that it was otherwise idle. Despite the greater resources and lack of virtualization, we are seeing the same behavior.

The ONLY difference we could see on the bare metal server is far more consistency in task duration for the parallel read operations. This is to be expected, but the total runtime and CPU load is not significantly different from the virtual guest environment of before.

We have also tested on a homebrew-built instance of the server running natively on a macbook pro with 16gb ram and otherwise idle, with similar results. It does not appear to make any difference what operating environment rethink is running on - the read performance is still abysmal.

@jdoliner
Copy link

The event "Do range scan on primary index" includes the time needed to read the values off of disk. Do you know if any IO does happen while the query is being performed? Or is all of the data in memory?

@wildattire
Copy link

@jdoliner yeah check the vmstat I posted above. It's all cpu.

@coffeemug
Copy link
Author

@wildattire -- thank you for submitting the detailed report, it helps immensely. Let's keep track of this in rethinkdb/rethinkdb#1766. We'll try to get to this as soon as we can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment