Here’s a way to implement Consistent Vote Counting using Cassandra that doesn’t depend on vector clocks or an atomic increment operation.
On a vote, use the object id as key, and insert into the votes CF a column named for the current timestamp. This can be a TimeUUID, or a microseconds count if you’re stingy. Note that you can’t just use the column’s timestamp for this trick: it depends on doing a slice query on the column name. (There’s no corresponding way to query against columns’ timestamps.)
First, get the cached count:
cached_count, counted_until = db.get_columns('cached_counts', 'back in black', ['cached_count', 'counted_until'])Next, do a get_slice from the
counted_until timestamp onward:newer_votes = db.get('votes', 'back in black', :start => counted_until.to_i)The number of votes is the
cached_count plus the number of newer_votesvotes_count = cached_count.to_i + newer_votes.length
Choose a time in the distant past: far enough back you’re comfortable the database is consistent.
new_counted_until = 1.day.ago.to_iRead the current
cached_count and counted_until;cached_count, counted_until = db.get_columns('cached_counts', 'back in black', ['cached_count', 'counted_until'])Then count the columns between
counted_until and new_counted_until. Note that this time we’re specifying an endpoint:countable_votes = db.get('votes', 'back in black', :start => counted_until.to_i, :finish => new_counted_until)Insert the new
cached_count and counted_until into the cached_counts CF:insert('cached_counts', 'back in black', { "cached_count" => 4, :counted_until => new_counted_until })
Say that we begin with the state
votes:
"back in black" => { 201005211200 => '1', 201005201159 => '1', 201005201157 => '1', 201005011900 => '1', 201004190600 => '1' },
cached_counts:
"back in black" => { 'cached_count' => 2, 'counted_until' => 201005011931 },
Alice starts the update process at 12pm, and slices from 201005011931 to 2010052111158. She will get a new count of 3. Meanwhile at 12:05 Bob started an update and sliced from 201005011931 to 201005211204. He will get a new count of 5.
It doesn’t matter which order they commit: the assertions
"back in black" => { 'cached_count' => 3, 'counted_until' => 2010052111158 }and
"back in black" => { 'cached_count' => 5, 'counted_until' => 201005211204 }are both correct. The start and end timestamps of the get_slice are distant enough that both queries are in themselves consistent. Since we include the endpoint with the assertion, the cached_count is also correct.
Note: If you like, you can avoid the extra counted_until column by instead using it as the value for the cached_count column’s timestamp. Since the ruby gem doesn’t yet expose the timestamp on reads, however, I’ve just made it a separate column.