Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save mrflip/416666 to your computer and use it in GitHub Desktop.
Save mrflip/416666 to your computer and use it in GitHub Desktop.
votes CF
"back in black" => { 201005211200 => '1', 201005201159 => '1', 201005201157 => '1', 201005011900 => '1', 201004190600 => '1' },
"black album" => { 201005021800 => '1', 201005010600 => '1' },
"black star" => { 201005011000 => '1' }
cached_counts CF
"back in black" => { 'cached_count' => 2, 'counted_until' => 201005011931 },
"black album" => { 'cached_count' => 1, 'counted_until' => 201005010600 }

Consistent Vote Counting in Cassandra

Here’s a way to implement Consistent Vote Counting using Cassandra that doesn’t depend on vector clocks or an atomic increment operation.

Recording a vote

On a vote, use the object id as key, and insert into the votes CF a column named for the current timestamp. This can be a TimeUUID, or a microseconds count if you’re stingy. Note that you can’t just use the column’s timestamp for this trick: it depends on doing a slice query on the column name. (There’s no corresponding way to query against columns’ timestamps.)

To read the number of votes:

First, get the cached count:

cached_count, counted_until = db.get_columns('cached_counts', 'back in black', ['cached_count', 'counted_until'])

Next, do a get_slice from the counted_until timestamp onward:
newer_votes = db.get('votes', 'back in black', :start => counted_until.to_i)

The number of votes is the cached_count plus the number of newer_votes
votes_count = cached_count.to_i + newer_votes.length

To update the cached count:

Choose a time in the distant past: far enough back you’re comfortable the database is consistent.

new_counted_until = 1.day.ago.to_i

Read the current cached_count and counted_until;
cached_count, counted_until = db.get_columns('cached_counts', 'back in black', ['cached_count', 'counted_until'])

Then count the columns between counted_until and new_counted_until. Note that this time we’re specifying an endpoint:
countable_votes = db.get('votes', 'back in black', :start => counted_until.to_i, :finish => new_counted_until)

Insert the new cached_count and counted_until into the cached_counts CF:
insert('cached_counts', 'back in black', { "cached_count" => 4, :counted_until => new_counted_until })

Consistency

Say that we begin with the state


votes:
  "back in black" => { 201005211200 => '1', 201005201159 => '1', 201005201157 => '1',  201005011900 => '1', 201004190600 => '1' },
cached_counts:
  "back in black" => { 'cached_count' => 2, 'counted_until' => 201005011931 },

Alice starts the update process at 12pm, and slices from 201005011931 to 2010052111158. She will get a new count of 3. Meanwhile at 12:05 Bob started an update and sliced from 201005011931 to 201005211204. He will get a new count of 5.

It doesn’t matter which order they commit: the assertions

  "back in black" => { 'cached_count' => 3, 'counted_until' => 2010052111158 }

and
  "back in black" => { 'cached_count' => 5, 'counted_until' => 201005211204 }

are both correct. The start and end timestamps of the get_slice are distant enough that both queries are in themselves consistent. Since we include the endpoint with the assertion, the cached_count is also correct.

Note: If you like, you can avoid the extra counted_until column by instead using it as the value for the cached_count column’s timestamp. Since the ruby gem doesn’t yet expose the timestamp on reads, however, I’ve just made it a separate column.

@enthal
Copy link

enthal commented Nov 5, 2012

Interesting! But:
If it "depend on vector clocks" does it instead depend on a consistent distributed (i.e., not partition-tolerant) timestamp facility? If clocks get sufficiently out of sync for some given access rate, isn't it possible to drop votes? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment