Skip to content

Instantly share code, notes, and snippets.

View FrancescaK's full-sized avatar

Francesca Krihely FrancescaK

View GitHub Profile
@FrancescaK
FrancescaK / gist:9371062
Last active August 29, 2015 13:57
STEP Add Tags
mongos> sh.addShardTag('shard0000', 'recent')
mongos> sh.addShardTag('shard0001', 'tier 2')
mongos> sh.addShardTag('shard0002', 'tier 2')
@FrancescaK
FrancescaK / gist:9371048
Last active August 29, 2015 13:57
STEP connect, add tags, insert data, verify sharding status, set up shards,
MongoDB shell version: 2.4.6
connecting to: 127.0.0.1:30004/test
//
mongos> sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 3,
"minCompatibleVersion" : 3,
"currentVersion" : 4,
sh.setBalancerState (false)
use config
#adjust lower bound of the "current" range
db.tags.update( { "tag" : "current" }, { "$set" : { "min._id" : 201208140000032543 } } )
#adjust upper bound of the "archive" range
db.tags.update( { "tag" : "archive" }, { "$set" : { "max._id" : 201208140000000000 } } )
sh.setBalancerState(true)
@FrancescaK
FrancescaK / add shards and a shard key
Last active December 30, 2015 23:29
Adding Shards and a shard key to the archive and recent clusters
#add two shards
sh.addShardTag ("shard0000", "archive")
sh.addShardTag ("shard0001", "recent")
#shard database and collection, use "_id" as shard key
sh.enableSharding("mydb")
sh.shardCollection("mydb.mydata", { "_id" : 1 })
#define tag ranges
#first tag range: from the beginning of time until 1 Jan 2013, everything goes to "archive"
@FrancescaK
FrancescaK / pagerank_port.js
Created September 3, 2013 14:39
The **10** airports with the most PageRank are:
1.{ pg: 0.06370586088275128,
airportCode: "ATL",
airportState: "Georgia",
airportStateId: "GA",
airportCity: "Atlanta, GA" },
2.{ pg: 0.04987817077679942,
airportCode: "ORD",
airportState: "Illinois",
airportStateId: "IL",
airportCity: "Chicago, IL" },
@FrancescaK
FrancescaK / reduce_function.js
Created September 3, 2013 14:37
The reduce function looks like this: The reduce function has two duties: 1. Collect the `prs` and `prevpg` information for each node; 2. Accumulate the total PageRank score sent to each node.
var reduce = function(airportId, values) {
var pg = 0
, diff = 0
, prs = {}
, prevpg = 0
, beta = 0.9
, totalNodes = 0;
for (var i in values) {
// Retrieve the previous pagerank and the probability matrix
@FrancescaK
FrancescaK / pagerank_calc.js
Created September 3, 2013 14:34
Next, we wrote some JavaScript code to calculate PageRank on the graph stored in the database. The goal was to create a new collection `fpg_i` for every *i*th iteration of PageRank. Every iteration is a call on oneiteration() in [iteration.js](https://github.com/10gen-interns/big-data-exploration/blob/master/PageRank-Flights/src/iteration.js) co…
var map = function() {
// For each node that is reachable from this node, give it the
// appropriate portion of my pagerank
for (var toNode in this["value"]["prs"]) {
emit(toNode, {totalNodes : 0.0
, pg : this["value"]["prs"][toNode] * this["value"]["pg"]
, prs : {}
, diff : 0.0
, prevpg : 0.0});
}
@FrancescaK
FrancescaK / track_edge.js
Created September 3, 2013 14:33
For each document, we create (or modify) at least one node that keeps track of this "edge"
{
"_id" : "12478",
"value" : {
"pg" : (1 / NUM_OF_AIRPORTS_IN_DB),
"prs" : {
"12892" : (NUM_OF_FLIGHTS_FROM_12478_TO_12892 / NUM_OF_FLIGHTS_FROM_12478),
...
}
}
}
@FrancescaK
FrancescaK / flying.js
Created September 3, 2013 14:32
Entry in the Flights collection in the Flying Database
{
"_id" : ObjectId("51bf..."),
...
"origAirportId" : 12478,
"origStateId" : "NY",
...
"destAirportId" : 12892,
"destStateId" : "CA",
...
}
db.products.insert({last_viewed:["bike","cd","game","bike","book"]})
db.products.findOne()
{
"_id" : ObjectId("51ff97d233c4f2089347cab6"),
"last_viewed" : [
"bike",
"cd",
"game",
"bike",
"book"