Count()
is O(n).
This can send a new developer running to the hills, as it seems like a trivial problem, however it is not. While we hope this gets addressed in the future (even in a non ideal way), there are work arounds.
Relevant Issues:
- rethinkdb/rethinkdb#5894
- rethinkdb/rethinkdb#2411
- rethinkdb/rethinkdb#3949
- rethinkdb/rethinkdb#3384
- rethinkdb/rethinkdb#1271
- Use the tables info command if an estimate is enough -
r.db('DB').table('TABLE').info()('doc_count_estimates').nth(0)
-
Upgrade your cluster: A sharded cluster with strong servers (SSD, memory, etc) helps a lot. You can also increase
--cache-size
. -
Add a table that saves your counts. You can:
- increase on every insert
- use a changefeed, prefarbly with a
squash
- just save the count result now and a again.
- add a "position/i/inesrted" field to the table and mantain in memory on inserts. That way the last record sorted by index has the count as it's "position/i/inesrted" propery.
To the best of my knowledge, if the bulk of your work is with processing tables with millions of rows and analizing them RethinkDB is probably not your best solution. You could also combine it with another DB.