Skip to content

Instantly share code, notes, and snippets.

@manpages
Created May 29, 2012 23:58
Show Gist options
  • Save manpages/2831541 to your computer and use it in GitHub Desktop.
Save manpages/2831541 to your computer and use it in GitHub Desktop.
Decay KV-Storage Design Document

Decay Erlang Key-Value storage

Why to write yet another nosql product?

Because I can! Well, to be completely honest, I was seeking for a tool that suits my needs totally with no success. I need a pretty looking KV storage with small codebase that’s scalable, damn fast™ and could be used in embedded fashion in Erlang applications. Believe it or not, but the closest thing I found that match those criterias is erarly versions of fission KV-storage by VeryPositive. Though the production version of fission that’s possibly coming up soon — again — is not quite what I need. As it is supposed to be gargantuan transaction-based monstrosity. May be it’ll be still pretty small and fairly usable, but I’m scared of riak2. Nonetheless, there is no reason not to follow their development process and not to contribute to it. For example by wrapping it into rebar and making reltool configurations at the very least… If I won’t understand a thing in their train of thoughts (as usual, hehe). So yeah, I want to write another project that will be handy primarily in development of scalable applications with huge amount of user-generated conent.

Philosophy of decay

Decay is aimed to be damn fast™, persistent, consistent and fault-*in*tolerant key-value storage. If bad things happen to a physical or virtual node then we won’t get information that is stored on that node. So, basically, there won’t be any redundance in decay, read about it further in Technical specs. Though for systems that require some information to be always avaliable because of being extremely important (for example — authentification data) I plan to add support for redundancy of data that is marked as important. In that case the data is copied and updated on the amount of nodes big enough for system to supply applications with the must-have data reliably.

Technical specs

Pretty much like service providers in resource discovery case, decay will make use of so called data providers managed transparently using a tool that is based on resource discovery. Decay nodes will store, update and feed Erlang terms. The distribution of data will be based on a linear order on the set of all possible terms. For example the set of all Erlang terms has natural linear order which is defined by the language specification. If decay user wants to, he or she can use another linear order along with slicing function for data fragmentation. So when a new node enters decay cluster, elder nodes assign a subset of Erlang terms that are to be managed by the newcomer and ship the relevant data within the defined boundaries to the new node. If a node shuts down gracefully, the data is redistributed among the survivors. That simple. Persistance model will be following — once in configurable amount of time we dump the term storage into a compressed binary file and in between those dumps we log all the requests handled by the node, so that after node restart (if it was shut down non-gracefully) we first restore the binary dump and then perform the commands one by one to get to the point where something bad happened.

Things to do after decay 0.1.X-rc is released

Extend decay with transactions

Transactions are good and fit in the architectural design, so why not?

Fork decay to slow-decay

It will share the code base with decay 0.1.X but will be fault-tolerant. Not sure if I need it for my projects, but it’d be interesting to play with CAP properties of the system (given that it has small and flexible code).

MapReduce

If I won’t come up with a map-reduce implementation in the 0.1.X-rc, then that’ll be the thing to think about later on.

Backup nodes

As it was pointed out by somebody from [email protected] the system won’t be that scalable due to the nature of huge distributed clusters where not only hardware but also network failures occur. Thus it’d be a good idea to have so called backup nodes — an option for virtual or physical node to keep binary blobs of data at hard disk with possibility to load certain data ranges in case of all kind of failures in order to replace the node that failed, which will add some failure tolerance to the system.

Closing words

I’ll do actuall stuff when I have some time. Hopefully pretty soon though. I’d be glad to hear criticism and any feedback in general from geeks around. Also, It’d be great if we collaborate by forking the document at github.

@manpages
Copy link
Author

Thanks to andoriyu for productive discussion at e@cjr (see logs for the night/early morning of May30).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment