Skip to content

Instantly share code, notes, and snippets.

@albertoperdomo
Created November 21, 2011 16:23
Show Gist options
  • Save albertoperdomo/1383129 to your computer and use it in GitHub Desktop.
Save albertoperdomo/1383129 to your computer and use it in GitHub Desktop.
Examples with MongoDB
This is a case study of how we would model parts of the spotify app in MongoDB.
MODEL
=====
users
-----
username
name
plain_txt_password
plan
list_ids: [id, id, id]
friend_ids: [id, id, id]
tracks
------
id
title
album_id
artist_ids [1,2,8]
albums
------
id
title
artist_id
published_at
produced_by
track_count
awards:[{id: new ObjectId(), title: "best pitbull featuring version", year: 2010, awarded_by: "MTV"}]
artists
-------
id
name
bio
picture: [{path: ..., filename: ..., version: "thumb", ...}]
active: true
overdose: true
lists
-----
id
user_id
title
public: true/false
track_ids: [id, id, id, id]
subscriber_count
plays (this collection should be hosted on a different host & database, because it will have a lot of frequent high volume writes)
-----
user_id
track_id
played_at
duration
SAMPLE QUERIES
==============
Your n favorite songs (the ones with the most plays by you)
-----------------------------------------------------------
One possibility is to use group but it has some limitations. We cannot sort by number of plays or limit the number of items we want to retrieve.
Group query:
db.plays.group({ query: { user_id: xxx }, key: 'track_id', reduce: function(doc, prev) { prev.n++; if (doc.played_at > prev.last_played_at) { prev.last_played_at = doc.played_at } }, initial: { n: 0, last_played_at: 0 })
result:
[ { _id: ..., value: { n: ..., last_played_at: ...}
map reduce query
----------------
db.plays.mapReduce(
function () { emit( this.track_id, { n: 1, t: this.played_at }) },
function (key, values) {
//// Prettier implementation
var n_res = 0;
var max_t = 0;
values.forEach(function(v){
n_res+=v.n;
if(v.t > max_t) max_t = v.t;
})
return {n: n_res, t: max_t};
///// Implementation with better performance
var ac = values[0];
for(var i = 1; i < values.length; i++) {
ac.n += values[i].n;
var v_t = values[i].t;
if(v_t > ac.t) ac.t = v_t;
}
return ac;
},
out: "fooo"
query: { user_id: xxx })
db.foo.find().sort(..)
How this map reduce works:
Sample data:
track_id played_at
--------------
1 100
2 101
1 102
2 100
1 200
emit(1, { n: 1, t: 100 })
emit(2, { n: 1, t: 101 })
emit(1, { n: 1, t: 102 })
emit(2, { n: 1, t: 100 })
emit(1, { n: 1, t: 200 })
reduce(1, [ { n: 1, t: 100 }, { n: 1, t: 102 }, { n: 1, t: 200 } ])
=> { n: 3, t: 200 }
reduce(2, [{ n: 1, t: 100 }, { n: 1, t: 101 }])
=> { n: 2, t: 101 }
...
Ideally this query should run in background and the results (e.g. your 10 most played songs) stored in the user's document for fast retrieval
Option A: comments are referencing a blogpost, like typically in ActiveRecord
=============================================================================
posts
-----
id
title
body
author_id -> user
comments
--------
id
text
author_id -> user
post_id -> post
Option B: comments are embedded in the blogpost document
========================================================
posts
-----
id
title
body
author_id -> user
comments: [{_id: ...., text: ..., author: ...., approved: true}, {...}]
An ID attribute is provided for the comment so that they can be handled invidually, as if they would be a separate entity, e.g. for approval process: Search blogpost by comment ID and set approved attribute.
posts.find({"comments._id": xxx})
It's also possible to define in the query params which parts of the document we do not want to retrieve, so we reduce the overhead when we are only interested in certain parts, e.g. the comment itself.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment