Skip to content

Instantly share code, notes, and snippets.

@bpot
Created July 16, 2013 01:53
Show Gist options
  • Save bpot/6005171 to your computer and use it in GitHub Desktop.
Save bpot/6005171 to your computer and use it in GitHub Desktop.

Dynotype

A write efficient implementation of large (consistent) sets on DynamoDB.

Why?

DynamoDB has built-in support for sets but if your sets are large (>1000 members) you quickly run into two issues:

  1. When calculating the cost of an item update AWS charges you based on the size of the item and not the size of the update. This makes even small additions to large sets expensive.

  2. AWS enforces a maximum size limit of 64KB per item. Once you pass that size you'll have to develop a scheme to store sets across multiple items.

In order to avoid these two issues a set is represented as an ordered series of items (write only log) containing addition and removal events for the set.

What do you mean by Consistent?

By consistent I mean that for all set operations there is a strict ordering and that all operations obey the following two rules:

  • A member is only added to the set if it is not currently a member.
  • A member is only removed from a set if it is already a member.

Updates to a set will fail if either of those two rules would be broken by the update. Additionally, you may optionally request that an update fail if the set has been modified since you last read it.

How do I use it?

# Setup the DynamoDB table
table = DynoType::ConsistentSetTable.new("table_name",
                                        :key_type => :number) # or :string
table.create_unless_exists

# Create a new set
user_id = 10
s = table.set_for(user_id)
s.update(:add => [1,2,3]) # writes the changes to DynamoDB
s.include?(1) # true
s.include?(5) # false

Caveats

  • Before adding or removing members from the set you must load the entire set into memory from the store. This is required so we can ensure that the write will not break the consistency rules above.

  • No set modification can be larger than 64KB, this means atomic set modifications are limited to 64KB.

  • This implementation may perform poorly if you have many writers. Dynotype uses a form of optimistic locking and if many processes are trying to update the same set you may have many failures for each succesful write. Dynotype is likely not the best solution in this case.

  • Each set is represented by one hash key and you may hit per key throughput limits if you have very large sets or if they have many updates. Future versions of Dynotype may support a partition set option that will allow you to spread a single set across many hash keys.

Compaction Algorithm

Compaction is implemented in such a way that multiple processes can simultaneously attempt compaction and a new process can pick up a compaction attempt where another process left off.

  1. Write a compaction event to the event log.
  2. Start writing the new compacted snapshot of the data type to the event log.
  3. Delete all versions less than (and including) the compaction event.

There are a couple constraints that ensure that all process seeing the compaction will work towards the same final state:

  • There can only be one compaction event in a log at a given time.
  • Given an event log for a DataType all processes will generate the same exact snapshot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment