newfront · April 25, 2011 19:56
diff --git a/ezra_meetup_discussion_nosql.txt b/ezra_meetup_discussion_nosql.txt
 #Ruby Meetup Group
 Ezra 

 Limitations of SQL
 (horizontal databases) - not covered in mysql

 - Limitations
 - don't scale past a single master

 #lesssql
 	- hybrid systems (solution)
 	*find a small part of solution not on critical path (session_data, logs, etc)
 	'Redis' - alternative database
 	
 New Tour of New Database Types:
 	Redis
 		- fast, in memory key/value store
 		- alternate data types -> lists, sets (hash table)
 		- set intersections (commonalities across users, lookup)
 		- sessions, hit counters, log buffers (can use)
 		
 		Pros
 			- all operations happen in memory
 		
 		Cons
 			- data has to fit in memory
 			- data structure server
 		
 		Re-Distribute-Your-Load
 		Efficient Data Handling (IO based)
 		
 		Scales (single threaded) (Like memcache)
 			- allows you to spread out over server machines
 		
 		Uses: as fast as you can get from a data store
 	
 	Tokyo Cabinet
 		- Large Data workhorse
 		- Fully Syncronus, no chance of losing data
 		- Memory Caching, 
 		
 		More key/value type
 			- can have extensible code structures built into system
 		
 		Pros: 
 			- Tokyo Server (80 GB)
 			- Fixed Length Records
 			- Efficient, Smallest on-desk footprint
 		
 		Cons:
 			- above 70GB, gets funky
 		
 		Replication
 			- master, master
 			- master, slave
 	
 		Uses For: Fastest, store large amounts of data, tune RAM server usage
 			- gets embedded in process
 		
 	
 	MongoDB
 		- document database
 		(mySQL of key/value stores) - easiest step from MySQL databases
 			- tables are collections of documents
 			- rolling buffer
 			- great complex queries
 			- index on attributes
 			- (Not Tied down to schema)
 		
 		- Set collections as Shartable (auto-rebalancing)
 		- JSON document database
 		
 		Cons: no transaction
 		
 		Pros: recovery tools
 			- advanced query system
 			- I/O open, write - grid file system
 			- scales horizontally 
 		
 		MongoDB - fast syncronus writes, good for web, logging, statistics
 			- can use hugely complex queries
 			- have flexibility in queries
 		
 	Riak
 		- Document Oriented DB
 			- HTTP/JSON query interface
 			- Add and Remove Nodes
 			-Erlang map/reduce query interface
 			- Tunable Nobs, I want you to write to 3 servers, etc (Rule sets)
 			http://riak.basho.com
 		
 			Pros:
 				- schemaless
 				- wants to stay alive
 			
 			Cons: 
 				- interface via http, json
 				- ruby binding
 			
 			Uses for: 
 				- manage
 				- add nodes when you need them
 	
 	Cassandra
 		- Eventually consistent node distribution
 			- column familys, etc
 			- structured key/value store
 			- can easily get back great sorted
 			RULES: rack aware, data aware, location aware
 			
 		- When you need to scale out huge amounts of data
 		
 		- Writes will always succeed 
 		
 		Pros: 
 			- Can add as many nodes as you need
 			- Twitter will jump on board
 			- Scale out over petabyte
 		
 		Cons: 
 	
 	Dynomite
 		- cliffmoon/dynomite
 		
 		- no high level types
 		- Based on Amazon's Dynamo Papers
 		- key/blob
 		
 		Uses: 
 			- Large amount of files (static) that you want to serve
 		
 		Cons: 
 			- bring new nodes into cluster (system can easily get overloaded)
 			- (re-balance data)
 		*in active development
 	
 		Use when you want to scale easily
 		Use as image asset store
 	
 	Redis, Tokyo, MongoDB (stable)
 	*being used in production
 	
 	*cassandra (look out for stable release)
 	
 	- Chef Recipes on github
 	
 	Pitfalls of #LSSSQL
 		- no referential Integrity
 		- not as much tooling
 		- almost non existent disaster recovery tools
 		- not as much production, used in anger experience
 	
 	*Customers care (save the data!)

 	Cloud-Computing
 		- horizontal cloud computing
 		- add more nodes when you need them (cloud data)
 	
 	*Hypertable (offline, large batch processing)
 		- map reduce, offline cron based processing
 	
 	*HyperCube
 		- object relational mapping
 	
 	*remapping 
 		- logic trees (easier to build out in new style dbs)
 	
 	Moneta (github)
 	
 	*InfoBrightEngine (for MySQL)
 	
 	
 	------------------------------------------------------
 	joins can be done within the client
 		- scalable by taking data from multiple end-points
 	------------------------------------------------------
 	
 	Day to Day Issues
 		- what happens when you hit your limits
 		- memcache infront of mysql, redistribute other data into multiple / single systems
 		- Solid State Drives (ssd - hotspots on ssd)
 	------------------------------------------------------
 	Fusion I/O (solid state)
 		- controllers getting smarter
 	
 	Riak
 		- boot config, simple to configure
 	
 	SlideShare - (post slides)
 	Google App Engine (Data Store) - always slow, but always same slow
 	Benchmarking: (?) - no huge studies
 	
 	Cassandra - nodes talk to eachother
 		- eventually consistent
 	
 	Key/Value convergence on the move.
 	
 	MongoDB
 		(+) Mongo team helps via IRC
 		(+) Feature Requests
 		(+) Good first step, document store
 		
 	Redis -> Tokyo
 	(s1):(s2)
 	
 	*breakdown of object model
 		- now multiple queries to save, build, etc (crash = dead state)
 		
 	- save code as rows in db, utilize db to run and return code
	#Ruby Meetup Group
	Ezra

	Limitations of SQL
	(horizontal databases) - not covered in mysql

	- Limitations
	- don't scale past a single master

	#lesssql
	- hybrid systems (solution)
	*find a small part of solution not on critical path (session_data, logs, etc)
	'Redis' - alternative database

	New Tour of New Database Types:
	Redis
	- fast, in memory key/value store
	- alternate data types -> lists, sets (hash table)
	- set intersections (commonalities across users, lookup)
	- sessions, hit counters, log buffers (can use)

	Pros
	- all operations happen in memory

	Cons
	- data has to fit in memory
	- data structure server

	Re-Distribute-Your-Load
	Efficient Data Handling (IO based)

	Scales (single threaded) (Like memcache)
	- allows you to spread out over server machines

	Uses: as fast as you can get from a data store

	Tokyo Cabinet
	- Large Data workhorse
	- Fully Syncronus, no chance of losing data
	- Memory Caching,

	More key/value type
	- can have extensible code structures built into system

	Pros:
	- Tokyo Server (80 GB)
	- Fixed Length Records
	- Efficient, Smallest on-desk footprint

	Cons:
	- above 70GB, gets funky

	Replication
	- master, master
	- master, slave

	Uses For: Fastest, store large amounts of data, tune RAM server usage
	- gets embedded in process


	MongoDB
	- document database
	(mySQL of key/value stores) - easiest step from MySQL databases
	- tables are collections of documents
	- rolling buffer
	- great complex queries
	- index on attributes
	- (Not Tied down to schema)

	- Set collections as Shartable (auto-rebalancing)
	- JSON document database

	Cons: no transaction

	Pros: recovery tools
	- advanced query system
	- I/O open, write - grid file system
	- scales horizontally

	MongoDB - fast syncronus writes, good for web, logging, statistics
	- can use hugely complex queries
	- have flexibility in queries

	Riak
	- Document Oriented DB
	- HTTP/JSON query interface
	- Add and Remove Nodes
	-Erlang map/reduce query interface
	- Tunable Nobs, I want you to write to 3 servers, etc (Rule sets)
	http://riak.basho.com

	Pros:
	- schemaless
	- wants to stay alive

	Cons:
	- interface via http, json
	- ruby binding

	Uses for:
	- manage
	- add nodes when you need them

	Cassandra
	- Eventually consistent node distribution
	- column familys, etc
	- structured key/value store
	- can easily get back great sorted
	RULES: rack aware, data aware, location aware

	- When you need to scale out huge amounts of data

	- Writes will always succeed

	Pros:
	- Can add as many nodes as you need
	- Twitter will jump on board
	- Scale out over petabyte

	Cons:

	Dynomite
	- cliffmoon/dynomite

	- no high level types
	- Based on Amazon's Dynamo Papers
	- key/blob

	Uses:
	- Large amount of files (static) that you want to serve

	Cons:
	- bring new nodes into cluster (system can easily get overloaded)
	- (re-balance data)
	*in active development

	Use when you want to scale easily
	Use as image asset store

	Redis, Tokyo, MongoDB (stable)
	*being used in production

	*cassandra (look out for stable release)

	- Chef Recipes on github

	Pitfalls of #LSSSQL
	- no referential Integrity
	- not as much tooling
	- almost non existent disaster recovery tools
	- not as much production, used in anger experience

	*Customers care (save the data!)

	Cloud-Computing
	- horizontal cloud computing
	- add more nodes when you need them (cloud data)

	*Hypertable (offline, large batch processing)
	- map reduce, offline cron based processing

	*HyperCube
	- object relational mapping

	*remapping
	- logic trees (easier to build out in new style dbs)

	Moneta (github)

	*InfoBrightEngine (for MySQL)


	------------------------------------------------------
	joins can be done within the client
	- scalable by taking data from multiple end-points
	------------------------------------------------------

	Day to Day Issues
	- what happens when you hit your limits
	- memcache infront of mysql, redistribute other data into multiple / single systems
	- Solid State Drives (ssd - hotspots on ssd)
	------------------------------------------------------
	Fusion I/O (solid state)
	- controllers getting smarter

	Riak
	- boot config, simple to configure

	SlideShare - (post slides)
	Google App Engine (Data Store) - always slow, but always same slow
	Benchmarking: (?) - no huge studies

	Cassandra - nodes talk to eachother
	- eventually consistent

	Key/Value convergence on the move.

	MongoDB
	(+) Mongo team helps via IRC
	(+) Feature Requests
	(+) Good first step, document store

	Redis -> Tokyo
	(s1):(s2)

	*breakdown of object model
	- now multiple queries to save, build, etc (crash = dead state)

	- save code as rows in db, utilize db to run and return code