MongoDB SF

http://www.mongodb.com/events/mongodb-sf-2014

Keynote

Mostly about 2.8 features.

document-level locking
pluggable storage engine
- in Asya's demo, it looks like rs.status() does not show the storage engine used by each replica set (lame)
- there's a bug hunt with prizes for the release candidate of the storage engine stuff
some MMS automation bullshit (magic click-to-deploy stuff)
- sounds like it makes the upgrade path simple, but it's not something we can make use of
  - however, we can easily emulate their approach (take down replica set members one at a time and upgrade them)

Internet of Things with MongoDB

should be a neat live demo
"10k inserts per second is where stuff gets interesting"
problems with IoT and systems with zillions of sensors:
- lots and lots of writes
- really big data
... writing some visualization stuff live using Processing (seems pretty off-topic...)
- nothing to do with mongodb yet
- so far just wasting time showing us how processing works and talking about basic programming/GUI stuff
everyone in the audience has a big white piece of paper and he's setting up a camera on stage that will detect when we're holding them up
- presenter has written more processing code to capture from the camera and display it on a different part of the canvas
  - also downsampling to low resolution and grayscale as well as cropping to make it easier to work with
now he's hooking up mongo to store readings from the camera
- storing in a collection:
  - current time
  - coordinate of pixel (x,y)
  - "color" (really just an int 0-255 for brightness since he's using grayscale and masking it)
first brute force approach has performance problems (can't do inserts quickly enough)
- going to refactor to use bulk inserts
- after refactor: much higher performance (about 50k inserts/second)
using aggregation framework to display analysis
- aggregating to determine average lightness of each frame in order to graph how many people are holding up the white signs
  - not an accurate measure, just being able to compare over time whether one moment was whiter or less white than others
- the simple approach here has performance issues
going to use "pre aggregation" to imrprove things
- basically only storing what we care about for the analysis and using upsert to reduce the number of unique documents
next step: hot spot analysis (checking each pixel for its whiteness)
- going to use a "singleton collection" (a collection that only ever has one document)
  - don't care about historical data, only the current state, so we can just overwrite this document as we go
  - this way queries only ever need to retrieve one thing
  - pre-aggregating values for each pixel at a unique x.y path in the collection
  - more processing code to display it
  - it never quite worked, but he ran out of time

Unify Your Selling Channels in One Product Catalog Service

"systems of engagement"
- ways to answer questions and take action when a customer is in the store
- rapid iteration in retail (e.g. "is this sale working? if not, what do i need to change?")
challenges:
- data model changes frequently
  - new products, partners, product/customer attributes, etc all the time
- desire to ask questions in real time
- geo-location
use cases: modern, seamless retail
- store rich product information (shitloads of attributes and relations to other products, etc)
- "consolidated customer view"
  - e.g. same customer across in-person store, online store, catalog orders, phone, etc
  - personalalize the (virtual) storefront for each customer
  - even external stuff, e.g. "what did this particular customer say about this particular product on facebook"
- objective is a "global product service"
  - single canonical view of a product, all products in one central service
    - schema needs to be flexible
    - geographical distribution
    - high volume read/write spikes, e.g. 100k reads/second
    - need good indexes!
  - how to manage multiple copies for the same data
    - e.g. individual store wants a local catalog of its products that are somehow copies of some central catalog of all products
    - briefly mentioned geographically-distributed replica set members (but this only solves the read problem, not writes)
    - not sure if he ever actually mentioned a full solution for this
  - responding to events in real time
    - examples:
      - twitter promotion on black friday for a discount, decided and implemented within an hour because of time-sensitivity
      - for virtual storefronts, what's the current weather at the customer's location? (e.g. do i advertise umbrellas or flip flops?)
  - price may vary across many dimensions:
    - product, size, color, store, customer, etc
  - search
TL;DR of everything so far: "retail has a naturally complicated data model and high performance/availability/consistency demands, also lots of reads and writes"
- k
- stopped taking detailed notes at this point, didn't seem worth it...
another +1 for "tailor your schema to your queries"
- and another +1 for pre-aggregation (his example was just a count, but still)
takeaway: mongoDB's flexible schemas are a much better fit for retail data than traditional strictly-schema'd RDBMSes
- but it still requires the same kind of planning and due dilligence

A Full-Stack, Realtime Database Driver: Meteor and the Next Generation of Web and Mobile Applications

Meteor is a JS framework/ecosystem for building realtime apps, made up of:
- LiveQuery
  - realtime DB queries
- DDP
  - subscribe to changes in DB
- MiniMongo
  - run db queries from the client
  - cache relevant data on the client
- Tracker
  - re-run functions when data changes
- Blaze
  - keep the view up-to-date with data
JS runs on both client and server
```
doSomethingOnBothClientAndServer();
if (Meteor.isClient) {
  doSomethingOnClientOnly();
} else if (Meteor.isServer) {
  doSomethingOnServerOnly();
}
```
- code does exist both places, just doesn't execute (because conditionals)
  - sounds like you want to be careful about where you put your secret sauce and how it is exposed
Blaze is a custom templating language
- HTML + goop
- it looks kinda like handlebars
- view automatically updates when data changes
  - e.g. you just have run a mongo query from the client, a bunch of magic happens, and the view automatically responds (without redrawing everything)
    - UI elements are bound by observers to live data (kind of like cursors with a websocket in the middle)
  - actually it's clever, the UI is updated as soon as the client fires off the update request
    - but the server is still the ultimate source of truth, it'll send back what it thinks is the new state of the data and if needed the UI gets updated again to reflect the server-side state
uses websockets for continuous communication
seems like this would have been a good choice for my battle cobras hackathon project
meteor has its own package system
- managing packages actually affects clients in real time too (no need to refresh your browser)
can explicitly decide what to publish to a client (the "autopublish" package just shoots out everything, great for demos, terrible for anything real)
- the default behavior is to observe for changes and blast out data appropriate to the given query
- the oplog is tailed to become aware of updates without having to poll
server knows about the current state of the client's data so that it doesn't send unnecessary junk
does not work with sharded clusters
"mongo1" discount code for eventedmind.com
my takeaway: a really neat approach, but still a immature and seems far too "insecure by default"
- also not sure how well it will scale

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

for his demo:
- one collection
- every document is a flight
  - tons o' fields
- will use data set to answer various questions
  - which airline has the most delays?
  - which airports are the worst in terms of cancelled flights?
  - etc
aggregation operations
- group
- sort
- sum
- avg
- projection:
  - everything to do with fields (one at a time)
    - create computed fields on outputs based on other fields
    - rename fields
    - etc
  - e.g. i have a number of total flights and a number of cancelled flights, what's the cancel rate?
    - answer: you use $divide to compute cancelled/total and spit it into a new field
- unwind
  - http://docs.mongodb.org/manual/reference/operator/aggregation/unwind/
  - splits up a doc with a field whose value is an array into separate documents (one per array value)
order of operation in an aggregation query does matter
- e.g. sort last after you have the smallest amount of documents, perform queries that redue the size of the data set earlier, etc
- http://docs.mongodb.org/manual/core/aggregation-pipeline-optimization/
overall this was more about "how to do analysis" and not very much about "how mongodb aggregation framework works"
- pretty much just a sales pitch for JSON Studio

MATH is Hard : TTL Index Configuration and Considerations

http://docs.mongodb.org/manual/core/index-ttl/

TTL indexes define how many seconds a document lives for
- a TTLMonitor process sweeps through and deletes documents whose TTL has expired
avoids having to do manual deletes of stale data
expire after vs expires at
- expireAfterSeconds is a global policy
- expiresAt is per-document expiration
- can combine these two
created like a normal index, just have to specify expireAfterSeconds
- you always need to specify expireAfterSeconds, even if you use expiresAt
  - can be expireAfterSeconds: 0
think about fragmentation and other costs of frequent deletes
- probably want to keep TTL'd data separate from other data to avoid performance issues
TTL index limitations:
- can't use _id
- can't use nulls
- no compound indexes
sounds like you manually set the timestamp for creation date/expires at?
- if so, gotta keep your app code smart
TTLMonitor doesn't always delete stuff immediately at the expiration date
- depends on workload, etc
- only runs once every 60 seconds
other tips:
- ISODate is microseconds, TTL is seconds

mkantor/MongoDB SF 2014 Notes.md

MongoDB SF

Keynote

Internet of Things with MongoDB

Unify Your Selling Channels in One Product Catalog Service

A Full-Stack, Realtime Database Driver: Meteor and the Next Generation of Web and Mobile Applications

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

MATH is Hard : TTL Index Configuration and Considerations