ox · August 12, 2013 19:20
diff --git a/gistfile1.txt b/gistfile1.txt
 The polar opinions on various databases always crack me up. If you ask a programmer about a given database they'll either love it for solving all of their problems or hate it for causing them. Most of the time the issue is actually caused by the way the programmer decided to originally model their data. This is a proposal for a program that will automatically handle the way associations are stored in a key-value store and hopefully end some of the mindless debates about how #webscale your favorite #noSQL DB is.

 Example Schema
 Throughout this article I’ll use the below schema for a user as an example:

 {
 id: String,
 name: String,
 friends: [User] 
 }

 Optimizing for Reads
 If we were to optimize the above user schema for reads, we would store entire user objects in a user’s array of friends. That way when you requested a user object, you could also get their friends with only one request.

 So what’s the problem? Imagine if Robert is friends with 1000 people. If Robert decides he likes the nickname Bob more, you would have to change his name in all of his 1000 friends’ arrays. Not very efficient.

 Optimizing for Updates
 In order to avoid the crazy update situation, we can make user.friends an array of user ids. Now, if we want to get information on all of a user’s friends, we can make a batch get-request for all of the users with ids in the friends array. This mimics how a SQL join works, except that SQL is generally faster at doing this.

 Optimizing the Optimization
 I’d like to propose a more intelligent approach to storing associations in key-value stores—optimizing some records for reads and some for updates, depending on the frequency at which they are accessed.

 Counting Links
 In the example of a user with friends, you probably don’t want to optimize for reads if the user is in more than X user’s list of friends. At some point, a user will be linked to enough other users where it won’t be worth duplicating their data in all of their friends’ arrays—instead, the data should be optimized for updates. I could see a programmer setting a max X per schema.

 Query Type Frequency
 A simple heuristic for what optimization to make (that incorporates the number of links) is to just count the number of times updates and reads are made for a given record — just optimize for whichever is more frequent.

 One problem with this could be if the number of reads and updates are roughly equivalent and changes non-deterministically. In that case, the system will constantly be changing the way data is saved, resulting in unnecessary database requests.

 Manual Declaration
 Due to scenarios like the one above, it would be nice to allow the programmer to decide how a record should be stored when they create/update it. It would also be interesting to incorporate this as an option at the schema level in an ORM.

 Implementation
 In order to implement this system, you would need:

 a delayed jobs queue that can handle a task to calculate whether or not the way a record is stored should be changed each time an update is made to a record
 a script to change the way a record is stored from embedded doc to a list of ids
 Conclusion
 Like all software implementations, there are going to be other limitations that this proposal is overlooking. Regardless, I think we can more intelligently store associations in key value stores.
	The polar opinions on various databases always crack me up. If you ask a programmer about a given database they'll either love it for solving all of their problems or hate it for causing them. Most of the time the issue is actually caused by the way the programmer decided to originally model their data. This is a proposal for a program that will automatically handle the way associations are stored in a key-value store and hopefully end some of the mindless debates about how #webscale your favorite #noSQL DB is.

	Example Schema
	Throughout this article I’ll use the below schema for a user as an example:

	{
	id: String,
	name: String,
	friends: [User]
	}

	Optimizing for Reads
	If we were to optimize the above user schema for reads, we would store entire user objects in a user’s array of friends. That way when you requested a user object, you could also get their friends with only one request.

	So what’s the problem? Imagine if Robert is friends with 1000 people. If Robert decides he likes the nickname Bob more, you would have to change his name in all of his 1000 friends’ arrays. Not very efficient.

	Optimizing for Updates
	In order to avoid the crazy update situation, we can make user.friends an array of user ids. Now, if we want to get information on all of a user’s friends, we can make a batch get-request for all of the users with ids in the friends array. This mimics how a SQL join works, except that SQL is generally faster at doing this.

	Optimizing the Optimization
	I’d like to propose a more intelligent approach to storing associations in key-value stores—optimizing some records for reads and some for updates, depending on the frequency at which they are accessed.

	Counting Links
	In the example of a user with friends, you probably don’t want to optimize for reads if the user is in more than X user’s list of friends. At some point, a user will be linked to enough other users where it won’t be worth duplicating their data in all of their friends’ arrays—instead, the data should be optimized for updates. I could see a programmer setting a max X per schema.

	Query Type Frequency
	A simple heuristic for what optimization to make (that incorporates the number of links) is to just count the number of times updates and reads are made for a given record — just optimize for whichever is more frequent.

	One problem with this could be if the number of reads and updates are roughly equivalent and changes non-deterministically. In that case, the system will constantly be changing the way data is saved, resulting in unnecessary database requests.

	Manual Declaration
	Due to scenarios like the one above, it would be nice to allow the programmer to decide how a record should be stored when they create/update it. It would also be interesting to incorporate this as an option at the schema level in an ORM.

	Implementation
	In order to implement this system, you would need:

	a delayed jobs queue that can handle a task to calculate whether or not the way a record is stored should be changed each time an update is made to a record
	a script to change the way a record is stored from embedded doc to a list of ids
	Conclusion
	Like all software implementations, there are going to be other limitations that this proposal is overlooking. Regardless, I think we can more intelligently store associations in key value stores.