Skip to content

Instantly share code, notes, and snippets.

@reiddraper
Created January 5, 2012 14:37
Show Gist options
  • Select an option

  • Save reiddraper/1565520 to your computer and use it in GitHub Desktop.

Select an option

Save reiddraper/1565520 to your computer and use it in GitHub Desktop.

Some thoughts...

  1. What about putting the type in the Content-Type field? Something like x-crdt-g-set-v1
  2. What's the rationale for representing things as a hash with lists as values instead of two hashes (in the OR and LWW set for example)? This to me seems closer to how it'll be represented in code (or at least how I've chosen to do with knockbox).

On JSON vs. Protobuf:

JSON has the advantage of easily being used in the browser, so you could actually use the browser's timestamp for some of the operations. This can be useful if for a single key, the single browser is the main actor doing writes. JSON also works easily with javascript mapreduce. If size is a concern, I'd be curious to see the size difference between compressed (snappy maybe?) JSON and Protobuf. I'm not saying Protobuf is a bad choice, just things to consider.

@seancribbs
Copy link

@reiddraper I like the media-type idea. Some possibilities I thought of: application/x-crdt-g-set-v1+json (following the rss+xml style), application/json;crdt="g-set";version="1" (using sub-type parameters)

@reiddraper
Copy link
Author

@seancribbs +1 to subtype params, and this will make it easier to apps that simply expect application/json and ignore the rest of the media-type.

@seancribbs
Copy link

The other nice thing about subtype params is that if Webmachine's conneg can be slightly modified, the client can ask for specific versions (or no version at all) and have the proper representation returned. Webmachine-Ruby already does this "best-fit" negotiation on media types.

@aphyr
Copy link

aphyr commented Jan 5, 2012

What about putting the type in the Content-Type field? Something like x-crdt-g-set-v1

I was aiming for this originally, but I think composability may be more important. If we place type hints in the structure, you can walk any json object recognizing crdt types for merge:

{
'name': {'type': 'lww-register', ...},
'following': {'type': '2p-set', ...},
...
}

So, I would vote for optional subtype params for indicating the JSON object can be parsed with CRDT types inside.

What's the rationale for representing things as a hash with lists as values instead of two hashes (in the OR and LWW set for example)? This to me seems closer to how it'll be represented in code (or at least how I've chosen to do with knockbox).

  1. Can't use hashes to store arbitrary keys.
  2. Symmetry between OR and LWW-sets.
  3. Cuts serialized size significantly for OR sets.
  4. Constant-space parsing, doesn't require building two intermediate maps to efficiently resolve member?. Cuts in-memory size too, I suspect. The in-memory structure I'd use is something like

{
element [[insert-tag1, insert-tag2], [remove-tag1]]
}

Might need to adapt to sets for large numbers of inserts and deletes, but that's a metaprogramming issue.

  1. Allows for interesting tricks, like solving member? and subset? in O(n) and constant space. Think mapreduce queries.
  2. Leaving myself wiggle room for order-preserving lists.

JSON vs Protobuf:

Why not both? I plan to export a .proto file for these types as well; you can drop them into your objects wherever. Downside is you can't type the sub-object, so I guess we'd have to leave the elements, tags, etc. as binary. We also have to standardize issues like string equality, sort orders, etc., instead of just "do whatever ecmascript does".

@reiddraper
Copy link
Author

I was aiming for this originally, but I think composability may be more important. If we place type hints in the structure, you can walk any json object recognizing crdt types for merge:

Yeah this is a good point. +1 for the optional subtype params

Can't use hashes to store arbitrary keys.

Maybe I'm misremembering JSON/Javascript, but I believe hashes can have arbitrary Strings as keys, just like the values in your lists. Looking here briefly seems to support my understanding.

respresentation

I'm not convinced of any real speed differences in representation. One method requires two hash lookups (O(1)), and the other requires one hash lookup and accessing a vector (in javascript accessing a vector is really a hash with integers as keys, iirc). Maybe I'm totally off base here though?

Having both protobuf and JSON is hard to argue with :)

@aphyr
Copy link

aphyr commented Jan 6, 2012

Sets should be able to store arbitrary elements, not just strings. As all JSON structures have an equality relation, sets like:

{ type: '2p-set', a: [{foo: 1}, {foo: 2}], r: [] }

are well-defined.

What I like about storing the elements in LWW/OR sets this way (in addition to space savings, which can be sizeable) is that you can build either in-memory representation efficiently. You'll be paying for the iteration over all keys regardless, so you can build two maps, multi-keyed maps, or a map/vector representation. My gut feeling is that a vector of tags is probably the fastest for small numbers of changes to each key, since you benefit from cache coherency... but I'll probably use all three in my Ruby implementation.

@reiddraper
Copy link
Author

Sets should be able to store arbitrary elements, not just strings. As all JSON structures have an equality relation, sets like:

Good point. This alone may have me convinced to use lists.

I'll hold off on speculating more about performance :)

@seancribbs
Copy link

seancribbs commented Jan 6, 2012 via email

@reiddraper
Copy link
Author

I can't immediately think of any downsides to requiring strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment