- It supports real primary key constraints, as compared to Google BigQuery or Amazon Redshift. Redshift allows you to specify primary key constraints, but only uses them in the query planner. If your row value is not actually unique, Redshift will give you incorrect
distinct
results. - There are no multi-row transactions. 1 mutation = 1 transaction.
- Reads are
scan
s, unless you're doing something like an equality predicate on a primary key. From @toddlipcon:
...if you put an equality predicate on the primary key, it doesn't actually "scan" data, it just goes to the correct row. One of our community contributors has been working on a Get API to make it a bit easier to do random reads (and will go through a more optimized code path on the backend).
- Two types of predicates: Equality (col value == scalar) and ranges
- User-defined partitioning schemes for request routing, with lots of flexibility in partitioning schemes.
- The Kudu team made some small improvements to the Raft algorithm
- Storage layout is decoupled from higher level APIs (yay!). I recently talked about this! https://github.com/wrobstory/ds4ds_2015
VACUUM
-like flushes from in-memoryMemRowSet
s toDiskRowSet
s are automatically managed.MemRowSet
s are concurrent, locking B-treesDiskRowSet
s are sortedDiskRowSet
s support dictionary, bitshuffle, front coding, in addition to LZ4/gzip/bzip compression.DiskRowSet
s are considered immutable once encoding, so they are using a Delta store similar to many other columnar database systems.
Hey, thanks for writing up these notes. Todd from the Kudu team here. A couple corrections (is it possible to pull-request a gist? :) )