albertoperdomo · November 21, 2011 16:23
diff --git a/mongify.txt b/mongify.txt
 This is a case study of how we would model parts of the spotify app in MongoDB.

 MODEL
 =====

 users
 -----
 username
 name
 plain_txt_password
 plan
 list_ids: [id, id, id]
 friend_ids: [id, id, id]

 tracks
 ------
 id
 title
 album_id
 artist_ids [1,2,8]

 albums
 ------
 id
 title
 artist_id
 published_at
 produced_by
 track_count
 awards:[{id: new ObjectId(), title: "best pitbull featuring version", year: 2010, awarded_by: "MTV"}]

 artists
 -------
 id
 name
 bio
 picture: [{path: ..., filename: ..., version: "thumb", ...}]
 active: true
 overdose: true

 lists
 -----
 id
 user_id
 title
 public: true/false
 track_ids: [id, id, id, id]
 subscriber_count

 plays (this collection should be hosted on a different host & database, because it will have a lot of frequent high volume writes)
 -----
 user_id
 track_id
 played_at
 duration

 SAMPLE QUERIES
 ==============

 Your n favorite songs (the ones with the most plays by you)
 -----------------------------------------------------------

 One possibility is to use group but it has some limitations. We cannot sort by number of plays or limit the number of items we want to retrieve.

 Group query:

 db.plays.group({ query: { user_id: xxx }, key: 'track_id', reduce: function(doc, prev) { prev.n++; if (doc.played_at > prev.last_played_at) { prev.last_played_at = doc.played_at } }, initial: { n: 0, last_played_at: 0 })

 result:

 [ { _id: ..., value: { n: ...,  last_played_at: ...}

 map reduce query
 ----------------

 db.plays.mapReduce(
 		function () { emit( this.track_id, { n: 1, t: this.played_at }) },
 		function (key, values) {


 			//// Prettier implementation
 			var n_res = 0;
 			var max_t = 0;
 			values.forEach(function(v){
 				n_res+=v.n;
 				if(v.t > max_t) max_t = v.t;
 			})
 			return {n: n_res, t: max_t};


 			///// Implementation with better performance
 			var ac = values[0];
 			for(var i = 1; i < values.length; i++) {
 				ac.n += values[i].n;
 				var v_t = values[i].t;
 				if(v_t > ac.t) ac.t = v_t;
 			}
 			return ac;
 		},
 			out: "fooo"
 		 query: { user_id: xxx })

 		db.foo.find().sort(..)

 How this map reduce works:

 Sample data:
 track_id played_at
 --------------
 1 100
 2 101
 1 102
 2 100
 1 200

 emit(1, { n: 1, t: 100 })
 emit(2, { n: 1, t: 101 })
 emit(1, { n: 1, t: 102 })
 emit(2, { n: 1, t: 100 })
 emit(1, { n: 1, t: 200 })


 reduce(1, [ { n: 1, t: 100 }, { n: 1, t: 102 }, { n: 1, t: 200 } ])
 	=>				  { n: 3, t: 200 }

 reduce(2, [{ n: 1, t: 100 }, { n: 1, t: 101 }])
  =>				  { n: 2, t: 101 }

 ...

 Ideally this query should run in background and the results (e.g. your 10 most played songs) stored in the user's document for fast retrieval
diff --git a/posts_and_comments.txt b/posts_and_comments.txt
 Option A: comments are referencing a blogpost, like typically in ActiveRecord
 =============================================================================

 posts
 -----
 id
 title
 body
 author_id -> user

 comments
 --------
 id
 text
 author_id -> user
 post_id -> post

 Option B: comments are embedded in the blogpost document
 ========================================================

 posts
 -----
 id
 title
 body
 author_id -> user
 comments: [{_id: ...., text: ..., author: ...., approved: true}, {...}]


 An ID attribute is provided for the comment so that they can be handled invidually, as if they would be a separate entity, e.g. for approval process: Search blogpost by comment ID and set approved attribute.

 posts.find({"comments._id": xxx})

 It's also possible to define in the query params which parts of the document we do not want to retrieve, so we reduce the overhead when we are only interested in certain parts, e.g. the comment itself.
	This is a case study of how we would model parts of the spotify app in MongoDB.

	MODEL
	=====

	users
	-----
	username
	name
	plain_txt_password
	plan
	list_ids: [id, id, id]
	friend_ids: [id, id, id]

	tracks
	------
	id
	title
	album_id
	artist_ids [1,2,8]

	albums
	------
	id
	title
	artist_id
	published_at
	produced_by
	track_count
	awards:[{id: new ObjectId(), title: "best pitbull featuring version", year: 2010, awarded_by: "MTV"}]

	artists
	-------
	id
	name
	bio
	picture: [{path: ..., filename: ..., version: "thumb", ...}]
	active: true
	overdose: true

	lists
	-----
	id
	user_id
	title
	public: true/false
	track_ids: [id, id, id, id]
	subscriber_count

	plays (this collection should be hosted on a different host & database, because it will have a lot of frequent high volume writes)
	-----
	user_id
	track_id
	played_at
	duration

	SAMPLE QUERIES
	==============

	Your n favorite songs (the ones with the most plays by you)
	-----------------------------------------------------------

	One possibility is to use group but it has some limitations. We cannot sort by number of plays or limit the number of items we want to retrieve.

	Group query:

	db.plays.group({ query: { user_id: xxx }, key: 'track_id', reduce: function(doc, prev) { prev.n++; if (doc.played_at > prev.last_played_at) { prev.last_played_at = doc.played_at } }, initial: { n: 0, last_played_at: 0 })

	result:

	[ { _id: ..., value: { n: ..., last_played_at: ...}

	map reduce query
	----------------

	db.plays.mapReduce(
	function () { emit( this.track_id, { n: 1, t: this.played_at }) },
	function (key, values) {


	//// Prettier implementation
	var n_res = 0;
	var max_t = 0;
	values.forEach(function(v){
	n_res+=v.n;
	if(v.t > max_t) max_t = v.t;
	})
	return {n: n_res, t: max_t};


	///// Implementation with better performance
	var ac = values[0];
	for(var i = 1; i < values.length; i++) {
	ac.n += values[i].n;
	var v_t = values[i].t;
	if(v_t > ac.t) ac.t = v_t;
	}
	return ac;
	},
	out: "fooo"
	query: { user_id: xxx })

	db.foo.find().sort(..)

	How this map reduce works:

	Sample data:
	track_id played_at
	--------------
	1 100
	2 101
	1 102
	2 100
	1 200

	emit(1, { n: 1, t: 100 })
	emit(2, { n: 1, t: 101 })
	emit(1, { n: 1, t: 102 })
	emit(2, { n: 1, t: 100 })
	emit(1, { n: 1, t: 200 })


	reduce(1, [ { n: 1, t: 100 }, { n: 1, t: 102 }, { n: 1, t: 200 } ])
	=> { n: 3, t: 200 }

	reduce(2, [{ n: 1, t: 100 }, { n: 1, t: 101 }])
	=> { n: 2, t: 101 }

	...

	Ideally this query should run in background and the results (e.g. your 10 most played songs) stored in the user's document for fast retrieval
	Option A: comments are referencing a blogpost, like typically in ActiveRecord
	=============================================================================

	posts
	-----
	id
	title
	body
	author_id -> user

	comments
	--------
	id
	text
	author_id -> user
	post_id -> post

	Option B: comments are embedded in the blogpost document
	========================================================

	posts
	-----
	id
	title
	body
	author_id -> user
	comments: [{_id: ...., text: ..., author: ...., approved: true}, {...}]


	An ID attribute is provided for the comment so that they can be handled invidually, as if they would be a separate entity, e.g. for approval process: Search blogpost by comment ID and set approved attribute.

	posts.find({"comments._id": xxx})

	It's also possible to define in the query params which parts of the document we do not want to retrieve, so we reduce the overhead when we are only interested in certain parts, e.g. the comment itself.