josephg · December 18, 2015 13:49 · digitalsanity · Oct 20, 2014
diff --git a/gistfile1.irclog b/gistfile1.irclog
 14:02  * josephg reads up
 14:03 < josephg> koppor, rawtaz: ShareJS does all the actual OT
 14:03 < josephg> racer is now a wrapper around it which does things like refs, reflists
 14:03 < josephg> ... it manages subscriptions for you (so if you change pages, you don't have to manually unsubscribe)
 14:03 < josephg> stuff like that.
 14:03 < josephg> ShareJS just does the document editing.
 14:04 < josephg> Redis is currently important for 3 things:
 14:05 < josephg> - We need to be able to atomically append to the op log. We're using redis's lua scripting to do atomic commits
 14:05 < josephg> - Redis is also used for pubsub between your backend servers
 14:05 < josephg> (well, between your servers)
 14:06 -!- liorix [[email protected]] has quit [Remote host closed the connection]
 14:06 < josephg> (remember the new version of derby is designed to scale across many backend processes - even the derby examples are currently running on 3 
                 load-balanced processes just to test it out)
 14:07 < josephg> And finally, redis is used to store the operation log. This is a bad idea, because it means all your ops have to fit in memory. I want to fix this 
                 sometime in the next few weeks.
 14:08 < koppor> josephg: Thank you for the information!
 14:08 < josephg> Mongo isn't blessed or special in the same way. You can use any database you like to store your data - take a look at share/livedb-mongo for an 
                 example of what the api it implements needs to look like
 14:09 < josephg> (we haven't published details on this yet because I might need to add more methods to support presence, cursors and the oplog)
 14:09 < k1i> so
 14:10 < k1i> i've been following this oplog issue quite closely
 14:10 < k1i> is there any reason the oplog can't be capped at a specific size, and clients trying to commit operations older than X version get discarded?
 14:10 -!- dascher [[email protected]] has quit [Remote host closed the connection]
 14:10 < josephg> Yeah we can do that
 14:10 < k1i> IMO that needs to be implemented, as, it's a simple solution - more complicated lifetime-oplog-storage techniques can be implemented
 14:11 < k1i> but for 99% of webapps, a capped oplog in Redis, to a specific amount of memory, will be enough
 14:11 < josephg> ... the only problem is dealing with the error correctly in the client.
 14:11 < k1i> most apps aren't going to be doing offline transformations over a long period of time - and if they are unique in that use case, they can add more 
             memory or do disk-based caching
 14:11 < k1i> Derby's problem.
 14:11 < josephg> :)
 14:12 < k1i> but Share needs to be able to have a limited oplog
 14:12 < k1i> also
 14:12 < josephg> Yeah - the other thing to do is actually cleaning up / removing old ops
 14:12 < k1i> it would be cool to be able to limit the oplog on specificcollections
 14:12 < k1i> redis-LRU, would probably be ideal
 14:12  * josephg looks up redis-lru
 14:12 < k1i> least-recently-used
 14:12 < k1i> it's built into redis as a garbage-collection mechanism
 14:13 < k1i> specific collections may need more lengthly oplogs, etc.
 14:13 < k1i> although
 14:13 < k1i> nevermind, not an issue
 14:13 < josephg> I'm currently storing oplogs in a redis list
 14:13 < josephg> we might have to put each op in a separate redis document to do that
 14:14 < josephg> ... which would make it slower when you try and get ops
 14:15 < josephg> - I guess we could write a lrange-equivalent command using lua scripting
 14:15 < k1i> yep
 14:15 < josephg> but I'm not sure how that'll interact with an externally (not in redis) stored oplog
 14:15 < k1i> redis's garbage collection is really quite good
 14:16 < k1i> especially across a cluster
 14:16 < josephg> probably, I won't do that straight away. Instead I'll first do the code to move the oplog out of redis
 14:16 < k1i> what do you mean?
 14:16 < k1i> out of redis, into where?
 14:17 < josephg> into - something else. mongo as a default, who knows
 14:17 < josephg> but the point is, into something that isn't locked in memory
 14:17 < k1i> well
 14:17 < josephg> (leveldb would be ideal - alas no network protocol)
 14:17 < k1i> mongo is going to give you a headache when it comes to write contention
 14:17 < k1i> redis is just really solid
 14:17 < josephg> and its really slow
 14:17 < k1i> yea
 14:17 < josephg> yeah I know, I love redis
 14:17 < k1i> I don't see why redis is an issue
 14:18 < josephg> its an issue because you get lots and lots of ops
 14:18 < k1i> I wrote a multi-server syncing cache/session store for Redis a while ago, it's a fine piece of technology
 14:18 < k1i> yeah, but, keeping a long oplog forever isn't an option
 14:18 < k1i> pruning needs to happen at the end of the day
 14:18 < k1i> unless you need to be able to support long-term playback
 14:18 < k1i> which 99% of apps can do without for the costs associated
 14:19 < josephg> right. I guess there's a couple of options here:
 14:19 < k1i> I can see a nice DOS-style attack being done via abusing old transformation versions
 14:19 < josephg> 1. We leave everything in redis, but remove old operations when we run out of ram
 14:20 < josephg> 2. Redis is used as the ultimate source of truth on the last op (so we still use it for contention control) but operations are shifted out into a 
                 secondary store once they've been applied to redis
 14:20 < josephg> ie, mongo or something
 14:20 < josephg> then we can prune (manually or using lru or something) stuff from redis with mostly impunity - its just a cache
 14:20 < josephg> + locking system for doing atomic increments
 14:21 < josephg> as for DOSing the server with old ops, the easy way to fix that is to just not allow any ops older than some age
 14:21 < k1i> -- aka deleting them
 14:21 < josephg> not necessarily.
 14:21 < josephg> we can just force clients to do all the OT work
 14:21 < k1i> if they aren't going to be allowed for use in transofrmation
 14:21 < k1i> ah
 14:22 < k1i> I like strategy 1
 14:22 < josephg> - although sending a bunch of ops over the wire is probably more expensive than transforming anyway.
 14:22 < k1i> because big corps are going to deploy massive redis server clusters
 14:22 < k1i> they do
 14:22 < k1i> (already)
 14:22 < k1i> it's capable of handling it, it's simple, and it already works
 14:22 < k1i> smaller users (who probably don't care about longterm playback anyway) can affordably implement OT for a limited timescale
 14:22 < k1i> also
 14:23 < k1i> you throw the error to Racer; racer could choose to implement some other kind of conflict resolution
 14:23 < k1i> manual, last-winner
 14:23 < josephg> no, there's nothing good racer can do there.
 14:23 < josephg> I catch a plane and do some work on the plane. I don't connect to the internet again in 2 days.
 14:24 < k1i> Racer can alert Derby, in which user apps can show the manual conflict?
 14:24 < josephg> there's been some changes to that document I was working on in the meantime
 14:24 < k1i> (resolution process)
 14:24 < josephg> ... well, I don't even have the diff of what other people have done.
 14:24 < josephg> I just have my own ops, my view of the document and the server's (changed) view of the document
 14:25 < josephg> I mean, we could punt to the application in that case
 14:25 < josephg> ... and make them figure out a diff, and do that whole dance
 14:25 < josephg> but its not fun. And most people won't bother.
 14:25 < k1i> personally?
 14:25 < k1i> I'll discard the user's changes
 14:25 < josephg> right - yeah most people will.
 14:25 < k1i> as in my use case, an extended absence from online is not a big deal (because it cant technically happen)
 14:26 < k1i> I want OT, but don't need extended replay
 14:26 < k1i> and if I do, I will throw more hardware at redis
 14:26  * josephg nods
 14:26 < josephg> for us, we're writing hiring software
 14:26 < josephg> and we want the oplog anyway for auditing
 14:27 < josephg> so if someone does something bad, we want to see exactly who did it and when
 14:27 < k1i> ah
 14:27 < k1i> I am writing transactional point of sale software
 14:27 < k1i> I was planning on creating a manual log
 14:27 < k1i> but, that's actually not a bad idea - abuse the log left by OT
 14:27 < josephg> right. Yeah, I guess we could do that instead
 14:27 < k1i> it seems like there is some overhead though in finding an operation
 14:27 < k1i> rather than creating a dedicated log on an action-by-action basis
 14:28 < josephg> yeah, maybe. You can play the operations back
 14:28 < josephg> actually, playback would be a fun thing to add to the godbox
 14:28 < josephg> should be pretty easy to do, too.
 14:29 < k1i> right now my main scare with Derby in general is the oplog growth issue (and validations, but thats another story)
 14:29 < josephg> Brian is adding schema validation at the moment for our app
 14:30 < josephg> sharejs exposes a validate function, so you can plug in your own schema validation / whatever logic in there
 14:30 < josephg> but yeah, the oplog growth issue is important
 14:30 < josephg> - and I want to solve that in the next few weeks in some form or other.
 14:30 < josephg> we also don't have any decent benchmarks about how the whole system performs
 14:31 < josephg> which is important for me - for example, if we move redis to have all the ops in their own key, how does that perform?
 14:31 < josephg> (although redis being redis, probably still waaaay better than any of the javascript)
 14:31 < k1i> also
 14:32 < k1i> sorry
 14:32 < k1i> this is very important
 14:32 < k1i> the fact Racer doesn't support Projections/ShareJS not supporting Mongo projections is hugely problematic for me
 14:32 < k1i> and I would expect most users
 14:32 < k1i> I shouldn't have to define a User's password field in a separate collection just to get it away from public eyes
 14:33 < josephg> yep.... I had this exact conversation on friday night with brian.
 14:33 < josephg> he's strongly of the opinion that we should support collections, and I don't want to add more parts to sharejs
 14:33 < k1i> again, it goes against conventional data modeling to not be able to do those kinds of operations
 14:34 < k1i> no matter the datastore
 14:34 < josephg> well, redis doesn't do projections
 14:34 < k1i> PGSQL (row), Mongo (document) - fields need hidden
 14:34 < josephg> but yeah - mongo and couch both do.
 14:34 < k1i> enterprises don't use redis for persistent datastore, either though, generally
 14:34 < josephg> true. nate and I have been talking about first adding filters
 14:34 < k1i> and I personally wouldn't bank an entire framework on an edge case persistent datastore (redis)
 14:35 < k1i> I saw that
 14:35 < k1i> and it looked interesting
 14:35 < josephg> yep - so thats probably what v1 will look like -
 14:35 < k1i> filtering specific 'fields' from being operated on
 14:35 < josephg> yep, and from being visible to a client.
 14:35 < josephg> so a client will have a specific view of a document. For example, a user can see their entire own profile
 14:35 < josephg> but only some fields of other user's profiles
 14:36 < josephg> we'll need to edit operations going to that client, but if we do it right, the client won't be able to tell that there even are more fields in the 
                 document
 14:36 < k1i> yep
 14:36 < k1i> that would be ideal
 14:36 < josephg> thats the 'perminant projection' system
 14:36 < k1i> this is something none of the realtime 'frameworks' that exist now have solved
 14:36 < josephg> interesting.
 14:37 < k1i> everyone can stop access to a specific document because a query can be built on it
 14:37 < k1i> but app-level security on individual fields is absolutely imperative
 14:37 < josephg> yep.
 14:37 < k1i> if a Derby or Meteor are going to win over rails in 'framework choice'
 14:37 < k1i> it's not even a passable option
 14:37 < josephg> ... the other thing that would be nice to have is a way for queries to only return part of a document
 14:37 < k1i> yes
 14:37 < k1i> that would increase efficiency
 14:38 < josephg> for example, if I'm viewing a list of documents, I probably only want a couple of fields
 14:38 < k1i> it can spawn weird edge cases
 14:38 < josephg> ... then if I click on one, I should see all the rest of the fields too
 14:38 < josephg> it sure can.
 14:38 < k1i> when certain fields are based on another
 14:38 < josephg> so yeah, thats going to take some more thought.
 14:38 < josephg> but we'll probably start with the filter thing - though for me its a lower priority than doing a bunch of benchmarks
 14:38 < josephg> and solving the oplog issue
 14:38 < k1i> well
 14:39 < k1i> yeah
 14:39 < k1i> some people can't even migrate to derby .5 due to a massive oplog
 14:39 < josephg> yeah exactly.
 14:39 < k1i> also, I am of the opinion that the oplog should be completely transient -
 14:39 < k1i> if I go in and 'redis-flush' everything away
 14:39 < k1i> that should be completely OK
 14:39 < k1i> and the app should be able to handle any issues associated with that
 14:39 < josephg> well, if we move the oplog out into something that mongo / whatever could provide
 14:39 < josephg> then you could always just store it in something that sometimes forgets ops
 14:39 < josephg> and we should make the system be able to deal iwth that too.
 14:40 < k1i> yea
 14:40 < k1i> I like redis
 14:40 < k1i> but
 14:40 < k1i> the memory thing is a bit tricky
 14:40  * josephg nods
 14:40 < josephg> koppor: are you still around?
 14:40 < josephg> ... koppor was asking about socket.io
 14:41 < k1i> yes
 14:41 < k1i> id like to ask you about that as well
 14:41 < k1i> what is the current issue with native websocket?
 14:41 < josephg> I dunno if its gotten better since, but I hate socket.io because of all the grief it caused me while doing sharejs
 14:41 < k1i> when was the last time you used it
 14:41 < josephg> its just unreliable, it doesn't guarantee message ordering
 14:41 < josephg> um, about 18 months ago
 14:41 < k1i> can you try engine.io
 14:42 < josephg> ... and it can tell you a client disconnected, then give you more ops for that client
 14:42 < josephg> I dunno man - I don't trust it.
 14:42 < k1i> https://github.com/LearnBoost/engine.io
 14:42 < k1i> engine.io is heavily actively developed
 14:42  * josephg shrugs
 14:42 < josephg> does it order operations?
 14:42 < josephg> ... anyway, the new architecture of sharejs means that you can use whatever you want.
 14:42 < k1i> native websockets have a huge, huge advantage
 14:43 < josephg> in performance, yeah
 14:43 < k1i> in that they don't require a sticky-sessioning LB to maintain efficiency on the server-side
 14:43 < k1i> much easier to scale
 14:43 < k1i> obviously you will want one for fallback clients, but, still
 14:44 < josephg> ... you don't?
 14:44 < k1i> for native websockets?
 14:44 < josephg> hm I guess not.
 14:44 < k1i> the TCP connection is maintained by whatever LB you are running
 14:44 < k1i> it's inherently 'sticky' as it's an open socket
 14:44 < k1i> the LB can then round-robin, least-load, etc. any other connection
 14:45 < josephg> right, but you aren't just doing request-response over the socket
 14:45 < k1i> that's probably my favorite feature about websockets
 14:45 < k1i> but the connection remains open throughout the duration of a clients' visit, though, right?
 14:45 < josephg> you also need to be able to send to the client when one of the subscribed documents changes
 14:45 < k1i> yes
 14:45 < josephg> ... and to do that you need a server to be 'responsible' for the client anyway
 14:45 < k1i> I am saying just at an LB-level
 14:46 < josephg> hm - I guess you could have any server able to send to the client
 14:46 < k1i> the LB has less-thought into maintaining a stateful websocket than stateless polling
 14:46 < k1i> no, the client still gets talked to by their associated server
 14:47 < k1i> if the client refreshes, they reconnect and setup a new copy of the redis-stored session on another backend server
 14:48 < josephg> ... so which server sends a client ops for its subscriptions?
 14:48 < k1i> the server that they are cnnected to via websocket
 14:49 < k1i> initially
 14:49 < josephg> oooooooh
 14:49 < k1i> the websocket has no reason to ever close
 14:49 < josephg> right, because the load balancer will send the websocket *somewhere* it doesn't matter where
 14:49 < k1i> so the client has no reason to ever get connected to a different server
 14:49 < k1i> yes
 14:49 < josephg> and that server is responsible for that client foever.
 14:49 < k1i> and it stays open
 14:49 < josephg> yeah
 14:49 < k1i> the LB never touches the websocket again after it's opened
 14:49 < k1i> they know how to pass socketed traffic
 14:49 < josephg> yep - its just that the load balancer doesn't hav eto know. It just pipes
 14:49 < k1i> now
 14:49 < k1i> sticky-sessioning is something you want for efficiency and fallback clients
 14:50 < josephg> yeah - lovely.
 14:50 < k1i> but, it makes LB a lot easier in high-scalability environments
 14:50 < k1i> to be able to roundrobin, etc.
 14:50 < k1i> also
 14:50 < k1i> LB's such as HAProxy will eventually have bindings written for them, for derby, etc.
 14:50 < k1i> to be able to contact them for client count
 14:51 < josephg> so in sharejs, because I got sick of people filing bugs about socket.io being broken, etc
 14:51 < josephg> I've moved to a system where the user is responsible for making the server-client connection
 14:52 < josephg> on the server, you pass sharejs a node 0.10 stream which it can use to talk to a client that just connected
 14:52 < k1i> yeah, that's probably good for node-like compatibility and abstraction
 14:52 < josephg> and on the client, you pass a websocket-like object which it'll use to talk to the server
 14:52 < k1i> I personally have total control over my clients
 14:52 < josephg> and then you can send sidechannel messages in the stream, etc.
 14:52 < josephg> yep
 14:52 < k1i> and will be forcing them all to be websocket-enabled browsers
 14:52 < josephg> ... yeah, so then you can use websockets
 14:53 < josephg> there's probably a couple issues you'll run into at the moment because I think I"m taking advantage of the fact that browserchannel lets you send 
                 messages while its connecting
 14:53 < josephg> - but let me know and I can fix them, or you can fix them.
 14:53 < josephg> but it should work.
 14:53 < josephg> thats the idea
 14:54 < josephg> there's racer-browserchannel kicking around somewhere taht has the 2 files or whatever which does the work
 14:54 < josephg> so yeah, go ahead and make a racer-websocket or whatever
 14:54 < josephg> and slot it in.
 14:55 < k1i> yeah
 14:55 < josephg> and koppor: likewise, use socket.io if you want. But if the server crashes because messages arrive out of order, its not my bug.
 14:55 < k1i> I think engine.io allows queued messages
 14:55 < josephg> if all your browsers support websocket, why bother?
 14:56 < josephg> websocket over https works great (better than websocket over http because proxies don't get in the way)
 14:56 < k1i> node-browserchannel doesn't use websockets?
 14:56 < josephg> nope. I wanted to add it, but I'd need to add websocket support to the closure library
 14:56 < k1i> yeah
 14:56 < josephg> it was on my nice-to-have list and less important than adding cursors to sharejs
	14:02 * josephg reads up
	14:03 < josephg> koppor, rawtaz: ShareJS does all the actual OT
	14:03 < josephg> racer is now a wrapper around it which does things like refs, reflists
	14:03 < josephg> ... it manages subscriptions for you (so if you change pages, you don't have to manually unsubscribe)
	14:03 < josephg> stuff like that.
	14:03 < josephg> ShareJS just does the document editing.
	14:04 < josephg> Redis is currently important for 3 things:
	14:05 < josephg> - We need to be able to atomically append to the op log. We're using redis's lua scripting to do atomic commits
	14:05 < josephg> - Redis is also used for pubsub between your backend servers
	14:05 < josephg> (well, between your servers)
	14:06 -!- liorix [[email protected]] has quit [Remote host closed the connection]
	14:06 < josephg> (remember the new version of derby is designed to scale across many backend processes - even the derby examples are currently running on 3
	load-balanced processes just to test it out)
	14:07 < josephg> And finally, redis is used to store the operation log. This is a bad idea, because it means all your ops have to fit in memory. I want to fix this
	sometime in the next few weeks.
	14:08 < koppor> josephg: Thank you for the information!
	14:08 < josephg> Mongo isn't blessed or special in the same way. You can use any database you like to store your data - take a look at share/livedb-mongo for an
	example of what the api it implements needs to look like
	14:09 < josephg> (we haven't published details on this yet because I might need to add more methods to support presence, cursors and the oplog)
	14:09 < k1i> so
	14:10 < k1i> i've been following this oplog issue quite closely
	14:10 < k1i> is there any reason the oplog can't be capped at a specific size, and clients trying to commit operations older than X version get discarded?
	14:10 -!- dascher [[email protected]] has quit [Remote host closed the connection]
	14:10 < josephg> Yeah we can do that
	14:10 < k1i> IMO that needs to be implemented, as, it's a simple solution - more complicated lifetime-oplog-storage techniques can be implemented
	14:11 < k1i> but for 99% of webapps, a capped oplog in Redis, to a specific amount of memory, will be enough
	14:11 < josephg> ... the only problem is dealing with the error correctly in the client.
	14:11 < k1i> most apps aren't going to be doing offline transformations over a long period of time - and if they are unique in that use case, they can add more
	memory or do disk-based caching
	14:11 < k1i> Derby's problem.
	14:11 < josephg> :)
	14:12 < k1i> but Share needs to be able to have a limited oplog
	14:12 < k1i> also
	14:12 < josephg> Yeah - the other thing to do is actually cleaning up / removing old ops
	14:12 < k1i> it would be cool to be able to limit the oplog on specificcollections
	14:12 < k1i> redis-LRU, would probably be ideal
	14:12 * josephg looks up redis-lru
	14:12 < k1i> least-recently-used
	14:12 < k1i> it's built into redis as a garbage-collection mechanism
	14:13 < k1i> specific collections may need more lengthly oplogs, etc.
	14:13 < k1i> although
	14:13 < k1i> nevermind, not an issue
	14:13 < josephg> I'm currently storing oplogs in a redis list
	14:13 < josephg> we might have to put each op in a separate redis document to do that
	14:14 < josephg> ... which would make it slower when you try and get ops
	14:15 < josephg> - I guess we could write a lrange-equivalent command using lua scripting
	14:15 < k1i> yep
	14:15 < josephg> but I'm not sure how that'll interact with an externally (not in redis) stored oplog
	14:15 < k1i> redis's garbage collection is really quite good
	14:16 < k1i> especially across a cluster
	14:16 < josephg> probably, I won't do that straight away. Instead I'll first do the code to move the oplog out of redis
	14:16 < k1i> what do you mean?
	14:16 < k1i> out of redis, into where?
	14:17 < josephg> into - something else. mongo as a default, who knows
	14:17 < josephg> but the point is, into something that isn't locked in memory
	14:17 < k1i> well
	14:17 < josephg> (leveldb would be ideal - alas no network protocol)
	14:17 < k1i> mongo is going to give you a headache when it comes to write contention
	14:17 < k1i> redis is just really solid
	14:17 < josephg> and its really slow
	14:17 < k1i> yea
	14:17 < josephg> yeah I know, I love redis
	14:17 < k1i> I don't see why redis is an issue
	14:18 < josephg> its an issue because you get lots and lots of ops
	14:18 < k1i> I wrote a multi-server syncing cache/session store for Redis a while ago, it's a fine piece of technology
	14:18 < k1i> yeah, but, keeping a long oplog forever isn't an option
	14:18 < k1i> pruning needs to happen at the end of the day
	14:18 < k1i> unless you need to be able to support long-term playback
	14:18 < k1i> which 99% of apps can do without for the costs associated
	14:19 < josephg> right. I guess there's a couple of options here:
	14:19 < k1i> I can see a nice DOS-style attack being done via abusing old transformation versions
	14:19 < josephg> 1. We leave everything in redis, but remove old operations when we run out of ram
	14:20 < josephg> 2. Redis is used as the ultimate source of truth on the last op (so we still use it for contention control) but operations are shifted out into a
	secondary store once they've been applied to redis
	14:20 < josephg> ie, mongo or something
	14:20 < josephg> then we can prune (manually or using lru or something) stuff from redis with mostly impunity - its just a cache
	14:20 < josephg> + locking system for doing atomic increments
	14:21 < josephg> as for DOSing the server with old ops, the easy way to fix that is to just not allow any ops older than some age
	14:21 < k1i> -- aka deleting them
	14:21 < josephg> not necessarily.
	14:21 < josephg> we can just force clients to do all the OT work
	14:21 < k1i> if they aren't going to be allowed for use in transofrmation
	14:21 < k1i> ah
	14:22 < k1i> I like strategy 1
	14:22 < josephg> - although sending a bunch of ops over the wire is probably more expensive than transforming anyway.
	14:22 < k1i> because big corps are going to deploy massive redis server clusters
	14:22 < k1i> they do
	14:22 < k1i> (already)
	14:22 < k1i> it's capable of handling it, it's simple, and it already works
	14:22 < k1i> smaller users (who probably don't care about longterm playback anyway) can affordably implement OT for a limited timescale
	14:22 < k1i> also
	14:23 < k1i> you throw the error to Racer; racer could choose to implement some other kind of conflict resolution
	14:23 < k1i> manual, last-winner
	14:23 < josephg> no, there's nothing good racer can do there.
	14:23 < josephg> I catch a plane and do some work on the plane. I don't connect to the internet again in 2 days.
	14:24 < k1i> Racer can alert Derby, in which user apps can show the manual conflict?
	14:24 < josephg> there's been some changes to that document I was working on in the meantime
	14:24 < k1i> (resolution process)
	14:24 < josephg> ... well, I don't even have the diff of what other people have done.
	14:24 < josephg> I just have my own ops, my view of the document and the server's (changed) view of the document
	14:25 < josephg> I mean, we could punt to the application in that case
	14:25 < josephg> ... and make them figure out a diff, and do that whole dance
	14:25 < josephg> but its not fun. And most people won't bother.
	14:25 < k1i> personally?
	14:25 < k1i> I'll discard the user's changes
	14:25 < josephg> right - yeah most people will.
	14:25 < k1i> as in my use case, an extended absence from online is not a big deal (because it cant technically happen)
	14:26 < k1i> I want OT, but don't need extended replay
	14:26 < k1i> and if I do, I will throw more hardware at redis
	14:26 * josephg nods
	14:26 < josephg> for us, we're writing hiring software
	14:26 < josephg> and we want the oplog anyway for auditing
	14:27 < josephg> so if someone does something bad, we want to see exactly who did it and when
	14:27 < k1i> ah
	14:27 < k1i> I am writing transactional point of sale software
	14:27 < k1i> I was planning on creating a manual log
	14:27 < k1i> but, that's actually not a bad idea - abuse the log left by OT
	14:27 < josephg> right. Yeah, I guess we could do that instead
	14:27 < k1i> it seems like there is some overhead though in finding an operation
	14:27 < k1i> rather than creating a dedicated log on an action-by-action basis
	14:28 < josephg> yeah, maybe. You can play the operations back
	14:28 < josephg> actually, playback would be a fun thing to add to the godbox
	14:28 < josephg> should be pretty easy to do, too.
	14:29 < k1i> right now my main scare with Derby in general is the oplog growth issue (and validations, but thats another story)
	14:29 < josephg> Brian is adding schema validation at the moment for our app
	14:30 < josephg> sharejs exposes a validate function, so you can plug in your own schema validation / whatever logic in there
	14:30 < josephg> but yeah, the oplog growth issue is important
	14:30 < josephg> - and I want to solve that in the next few weeks in some form or other.
	14:30 < josephg> we also don't have any decent benchmarks about how the whole system performs
	14:31 < josephg> which is important for me - for example, if we move redis to have all the ops in their own key, how does that perform?
	14:31 < josephg> (although redis being redis, probably still waaaay better than any of the javascript)
	14:31 < k1i> also
	14:32 < k1i> sorry
	14:32 < k1i> this is very important
	14:32 < k1i> the fact Racer doesn't support Projections/ShareJS not supporting Mongo projections is hugely problematic for me
	14:32 < k1i> and I would expect most users
	14:32 < k1i> I shouldn't have to define a User's password field in a separate collection just to get it away from public eyes
	14:33 < josephg> yep.... I had this exact conversation on friday night with brian.
	14:33 < josephg> he's strongly of the opinion that we should support collections, and I don't want to add more parts to sharejs
	14:33 < k1i> again, it goes against conventional data modeling to not be able to do those kinds of operations
	14:34 < k1i> no matter the datastore
	14:34 < josephg> well, redis doesn't do projections
	14:34 < k1i> PGSQL (row), Mongo (document) - fields need hidden
	14:34 < josephg> but yeah - mongo and couch both do.
	14:34 < k1i> enterprises don't use redis for persistent datastore, either though, generally
	14:34 < josephg> true. nate and I have been talking about first adding filters
	14:34 < k1i> and I personally wouldn't bank an entire framework on an edge case persistent datastore (redis)
	14:35 < k1i> I saw that
	14:35 < k1i> and it looked interesting
	14:35 < josephg> yep - so thats probably what v1 will look like -
	14:35 < k1i> filtering specific 'fields' from being operated on
	14:35 < josephg> yep, and from being visible to a client.
	14:35 < josephg> so a client will have a specific view of a document. For example, a user can see their entire own profile
	14:35 < josephg> but only some fields of other user's profiles
	14:36 < josephg> we'll need to edit operations going to that client, but if we do it right, the client won't be able to tell that there even are more fields in the
	document
	14:36 < k1i> yep
	14:36 < k1i> that would be ideal
	14:36 < josephg> thats the 'perminant projection' system
	14:36 < k1i> this is something none of the realtime 'frameworks' that exist now have solved
	14:36 < josephg> interesting.
	14:37 < k1i> everyone can stop access to a specific document because a query can be built on it
	14:37 < k1i> but app-level security on individual fields is absolutely imperative
	14:37 < josephg> yep.
	14:37 < k1i> if a Derby or Meteor are going to win over rails in 'framework choice'
	14:37 < k1i> it's not even a passable option
	14:37 < josephg> ... the other thing that would be nice to have is a way for queries to only return part of a document
	14:37 < k1i> yes
	14:37 < k1i> that would increase efficiency
	14:38 < josephg> for example, if I'm viewing a list of documents, I probably only want a couple of fields
	14:38 < k1i> it can spawn weird edge cases
	14:38 < josephg> ... then if I click on one, I should see all the rest of the fields too
	14:38 < josephg> it sure can.
	14:38 < k1i> when certain fields are based on another
	14:38 < josephg> so yeah, thats going to take some more thought.
	14:38 < josephg> but we'll probably start with the filter thing - though for me its a lower priority than doing a bunch of benchmarks
	14:38 < josephg> and solving the oplog issue
	14:38 < k1i> well
	14:39 < k1i> yeah
	14:39 < k1i> some people can't even migrate to derby .5 due to a massive oplog
	14:39 < josephg> yeah exactly.
	14:39 < k1i> also, I am of the opinion that the oplog should be completely transient -
	14:39 < k1i> if I go in and 'redis-flush' everything away
	14:39 < k1i> that should be completely OK
	14:39 < k1i> and the app should be able to handle any issues associated with that
	14:39 < josephg> well, if we move the oplog out into something that mongo / whatever could provide
	14:39 < josephg> then you could always just store it in something that sometimes forgets ops
	14:39 < josephg> and we should make the system be able to deal iwth that too.
	14:40 < k1i> yea
	14:40 < k1i> I like redis
	14:40 < k1i> but
	14:40 < k1i> the memory thing is a bit tricky
	14:40 * josephg nods
	14:40 < josephg> koppor: are you still around?
	14:40 < josephg> ... koppor was asking about socket.io
	14:41 < k1i> yes
	14:41 < k1i> id like to ask you about that as well
	14:41 < k1i> what is the current issue with native websocket?
	14:41 < josephg> I dunno if its gotten better since, but I hate socket.io because of all the grief it caused me while doing sharejs
	14:41 < k1i> when was the last time you used it
	14:41 < josephg> its just unreliable, it doesn't guarantee message ordering
	14:41 < josephg> um, about 18 months ago
	14:41 < k1i> can you try engine.io
	14:42 < josephg> ... and it can tell you a client disconnected, then give you more ops for that client
	14:42 < josephg> I dunno man - I don't trust it.
	14:42 < k1i> https://github.com/LearnBoost/engine.io
	14:42 < k1i> engine.io is heavily actively developed
	14:42 * josephg shrugs
	14:42 < josephg> does it order operations?
	14:42 < josephg> ... anyway, the new architecture of sharejs means that you can use whatever you want.
	14:42 < k1i> native websockets have a huge, huge advantage
	14:43 < josephg> in performance, yeah
	14:43 < k1i> in that they don't require a sticky-sessioning LB to maintain efficiency on the server-side
	14:43 < k1i> much easier to scale
	14:43 < k1i> obviously you will want one for fallback clients, but, still
	14:44 < josephg> ... you don't?
	14:44 < k1i> for native websockets?
	14:44 < josephg> hm I guess not.
	14:44 < k1i> the TCP connection is maintained by whatever LB you are running
	14:44 < k1i> it's inherently 'sticky' as it's an open socket
	14:44 < k1i> the LB can then round-robin, least-load, etc. any other connection
	14:45 < josephg> right, but you aren't just doing request-response over the socket
	14:45 < k1i> that's probably my favorite feature about websockets
	14:45 < k1i> but the connection remains open throughout the duration of a clients' visit, though, right?
	14:45 < josephg> you also need to be able to send to the client when one of the subscribed documents changes
	14:45 < k1i> yes
	14:45 < josephg> ... and to do that you need a server to be 'responsible' for the client anyway
	14:45 < k1i> I am saying just at an LB-level
	14:46 < josephg> hm - I guess you could have any server able to send to the client
	14:46 < k1i> the LB has less-thought into maintaining a stateful websocket than stateless polling
	14:46 < k1i> no, the client still gets talked to by their associated server
	14:47 < k1i> if the client refreshes, they reconnect and setup a new copy of the redis-stored session on another backend server
	14:48 < josephg> ... so which server sends a client ops for its subscriptions?
	14:48 < k1i> the server that they are cnnected to via websocket
	14:49 < k1i> initially
	14:49 < josephg> oooooooh
	14:49 < k1i> the websocket has no reason to ever close
	14:49 < josephg> right, because the load balancer will send the websocket somewhere it doesn't matter where
	14:49 < k1i> so the client has no reason to ever get connected to a different server
	14:49 < k1i> yes
	14:49 < josephg> and that server is responsible for that client foever.
	14:49 < k1i> and it stays open
	14:49 < josephg> yeah
	14:49 < k1i> the LB never touches the websocket again after it's opened
	14:49 < k1i> they know how to pass socketed traffic
	14:49 < josephg> yep - its just that the load balancer doesn't hav eto know. It just pipes
	14:49 < k1i> now
	14:49 < k1i> sticky-sessioning is something you want for efficiency and fallback clients
	14:50 < josephg> yeah - lovely.
	14:50 < k1i> but, it makes LB a lot easier in high-scalability environments
	14:50 < k1i> to be able to roundrobin, etc.
	14:50 < k1i> also
	14:50 < k1i> LB's such as HAProxy will eventually have bindings written for them, for derby, etc.
	14:50 < k1i> to be able to contact them for client count
	14:51 < josephg> so in sharejs, because I got sick of people filing bugs about socket.io being broken, etc
	14:51 < josephg> I've moved to a system where the user is responsible for making the server-client connection
	14:52 < josephg> on the server, you pass sharejs a node 0.10 stream which it can use to talk to a client that just connected
	14:52 < k1i> yeah, that's probably good for node-like compatibility and abstraction
	14:52 < josephg> and on the client, you pass a websocket-like object which it'll use to talk to the server
	14:52 < k1i> I personally have total control over my clients
	14:52 < josephg> and then you can send sidechannel messages in the stream, etc.
	14:52 < josephg> yep
	14:52 < k1i> and will be forcing them all to be websocket-enabled browsers
	14:52 < josephg> ... yeah, so then you can use websockets
	14:53 < josephg> there's probably a couple issues you'll run into at the moment because I think I"m taking advantage of the fact that browserchannel lets you send
	messages while its connecting
	14:53 < josephg> - but let me know and I can fix them, or you can fix them.
	14:53 < josephg> but it should work.
	14:53 < josephg> thats the idea
	14:54 < josephg> there's racer-browserchannel kicking around somewhere taht has the 2 files or whatever which does the work
	14:54 < josephg> so yeah, go ahead and make a racer-websocket or whatever
	14:54 < josephg> and slot it in.
	14:55 < k1i> yeah
	14:55 < josephg> and koppor: likewise, use socket.io if you want. But if the server crashes because messages arrive out of order, its not my bug.
	14:55 < k1i> I think engine.io allows queued messages
	14:55 < josephg> if all your browsers support websocket, why bother?
	14:56 < josephg> websocket over https works great (better than websocket over http because proxies don't get in the way)
	14:56 < k1i> node-browserchannel doesn't use websockets?
	14:56 < josephg> nope. I wanted to add it, but I'd need to add websocket support to the closure library
	14:56 < k1i> yeah
	14:56 < josephg> it was on my nice-to-have list and less important than adding cursors to sharejs