Last active
December 18, 2015 13:49
-
-
Save josephg/5793219 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
14:02 * josephg reads up | |
14:03 < josephg> koppor, rawtaz: ShareJS does all the actual OT | |
14:03 < josephg> racer is now a wrapper around it which does things like refs, reflists | |
14:03 < josephg> ... it manages subscriptions for you (so if you change pages, you don't have to manually unsubscribe) | |
14:03 < josephg> stuff like that. | |
14:03 < josephg> ShareJS just does the document editing. | |
14:04 < josephg> Redis is currently important for 3 things: | |
14:05 < josephg> - We need to be able to atomically append to the op log. We're using redis's lua scripting to do atomic commits | |
14:05 < josephg> - Redis is also used for pubsub between your backend servers | |
14:05 < josephg> (well, between your servers) | |
14:06 -!- liorix [[email protected]] has quit [Remote host closed the connection] | |
14:06 < josephg> (remember the new version of derby is designed to scale across many backend processes - even the derby examples are currently running on 3 | |
load-balanced processes just to test it out) | |
14:07 < josephg> And finally, redis is used to store the operation log. This is a bad idea, because it means all your ops have to fit in memory. I want to fix this | |
sometime in the next few weeks. | |
14:08 < koppor> josephg: Thank you for the information! | |
14:08 < josephg> Mongo isn't blessed or special in the same way. You can use any database you like to store your data - take a look at share/livedb-mongo for an | |
example of what the api it implements needs to look like | |
14:09 < josephg> (we haven't published details on this yet because I might need to add more methods to support presence, cursors and the oplog) | |
14:09 < k1i> so | |
14:10 < k1i> i've been following this oplog issue quite closely | |
14:10 < k1i> is there any reason the oplog can't be capped at a specific size, and clients trying to commit operations older than X version get discarded? | |
14:10 -!- dascher [[email protected]] has quit [Remote host closed the connection] | |
14:10 < josephg> Yeah we can do that | |
14:10 < k1i> IMO that needs to be implemented, as, it's a simple solution - more complicated lifetime-oplog-storage techniques can be implemented | |
14:11 < k1i> but for 99% of webapps, a capped oplog in Redis, to a specific amount of memory, will be enough | |
14:11 < josephg> ... the only problem is dealing with the error correctly in the client. | |
14:11 < k1i> most apps aren't going to be doing offline transformations over a long period of time - and if they are unique in that use case, they can add more | |
memory or do disk-based caching | |
14:11 < k1i> Derby's problem. | |
14:11 < josephg> :) | |
14:12 < k1i> but Share needs to be able to have a limited oplog | |
14:12 < k1i> also | |
14:12 < josephg> Yeah - the other thing to do is actually cleaning up / removing old ops | |
14:12 < k1i> it would be cool to be able to limit the oplog on specificcollections | |
14:12 < k1i> redis-LRU, would probably be ideal | |
14:12 * josephg looks up redis-lru | |
14:12 < k1i> least-recently-used | |
14:12 < k1i> it's built into redis as a garbage-collection mechanism | |
14:13 < k1i> specific collections may need more lengthly oplogs, etc. | |
14:13 < k1i> although | |
14:13 < k1i> nevermind, not an issue | |
14:13 < josephg> I'm currently storing oplogs in a redis list | |
14:13 < josephg> we might have to put each op in a separate redis document to do that | |
14:14 < josephg> ... which would make it slower when you try and get ops | |
14:15 < josephg> - I guess we could write a lrange-equivalent command using lua scripting | |
14:15 < k1i> yep | |
14:15 < josephg> but I'm not sure how that'll interact with an externally (not in redis) stored oplog | |
14:15 < k1i> redis's garbage collection is really quite good | |
14:16 < k1i> especially across a cluster | |
14:16 < josephg> probably, I won't do that straight away. Instead I'll first do the code to move the oplog out of redis | |
14:16 < k1i> what do you mean? | |
14:16 < k1i> out of redis, into where? | |
14:17 < josephg> into - something else. mongo as a default, who knows | |
14:17 < josephg> but the point is, into something that isn't locked in memory | |
14:17 < k1i> well | |
14:17 < josephg> (leveldb would be ideal - alas no network protocol) | |
14:17 < k1i> mongo is going to give you a headache when it comes to write contention | |
14:17 < k1i> redis is just really solid | |
14:17 < josephg> and its really slow | |
14:17 < k1i> yea | |
14:17 < josephg> yeah I know, I love redis | |
14:17 < k1i> I don't see why redis is an issue | |
14:18 < josephg> its an issue because you get lots and lots of ops | |
14:18 < k1i> I wrote a multi-server syncing cache/session store for Redis a while ago, it's a fine piece of technology | |
14:18 < k1i> yeah, but, keeping a long oplog forever isn't an option | |
14:18 < k1i> pruning needs to happen at the end of the day | |
14:18 < k1i> unless you need to be able to support long-term playback | |
14:18 < k1i> which 99% of apps can do without for the costs associated | |
14:19 < josephg> right. I guess there's a couple of options here: | |
14:19 < k1i> I can see a nice DOS-style attack being done via abusing old transformation versions | |
14:19 < josephg> 1. We leave everything in redis, but remove old operations when we run out of ram | |
14:20 < josephg> 2. Redis is used as the ultimate source of truth on the last op (so we still use it for contention control) but operations are shifted out into a | |
secondary store once they've been applied to redis | |
14:20 < josephg> ie, mongo or something | |
14:20 < josephg> then we can prune (manually or using lru or something) stuff from redis with mostly impunity - its just a cache | |
14:20 < josephg> + locking system for doing atomic increments | |
14:21 < josephg> as for DOSing the server with old ops, the easy way to fix that is to just not allow any ops older than some age | |
14:21 < k1i> -- aka deleting them | |
14:21 < josephg> not necessarily. | |
14:21 < josephg> we can just force clients to do all the OT work | |
14:21 < k1i> if they aren't going to be allowed for use in transofrmation | |
14:21 < k1i> ah | |
14:22 < k1i> I like strategy 1 | |
14:22 < josephg> - although sending a bunch of ops over the wire is probably more expensive than transforming anyway. | |
14:22 < k1i> because big corps are going to deploy massive redis server clusters | |
14:22 < k1i> they do | |
14:22 < k1i> (already) | |
14:22 < k1i> it's capable of handling it, it's simple, and it already works | |
14:22 < k1i> smaller users (who probably don't care about longterm playback anyway) can affordably implement OT for a limited timescale | |
14:22 < k1i> also | |
14:23 < k1i> you throw the error to Racer; racer could choose to implement some other kind of conflict resolution | |
14:23 < k1i> manual, last-winner | |
14:23 < josephg> no, there's nothing good racer can do there. | |
14:23 < josephg> I catch a plane and do some work on the plane. I don't connect to the internet again in 2 days. | |
14:24 < k1i> Racer can alert Derby, in which user apps can show the manual conflict? | |
14:24 < josephg> there's been some changes to that document I was working on in the meantime | |
14:24 < k1i> (resolution process) | |
14:24 < josephg> ... well, I don't even have the diff of what other people have done. | |
14:24 < josephg> I just have my own ops, my view of the document and the server's (changed) view of the document | |
14:25 < josephg> I mean, we could punt to the application in that case | |
14:25 < josephg> ... and make them figure out a diff, and do that whole dance | |
14:25 < josephg> but its not fun. And most people won't bother. | |
14:25 < k1i> personally? | |
14:25 < k1i> I'll discard the user's changes | |
14:25 < josephg> right - yeah most people will. | |
14:25 < k1i> as in my use case, an extended absence from online is not a big deal (because it cant technically happen) | |
14:26 < k1i> I want OT, but don't need extended replay | |
14:26 < k1i> and if I do, I will throw more hardware at redis | |
14:26 * josephg nods | |
14:26 < josephg> for us, we're writing hiring software | |
14:26 < josephg> and we want the oplog anyway for auditing | |
14:27 < josephg> so if someone does something bad, we want to see exactly who did it and when | |
14:27 < k1i> ah | |
14:27 < k1i> I am writing transactional point of sale software | |
14:27 < k1i> I was planning on creating a manual log | |
14:27 < k1i> but, that's actually not a bad idea - abuse the log left by OT | |
14:27 < josephg> right. Yeah, I guess we could do that instead | |
14:27 < k1i> it seems like there is some overhead though in finding an operation | |
14:27 < k1i> rather than creating a dedicated log on an action-by-action basis | |
14:28 < josephg> yeah, maybe. You can play the operations back | |
14:28 < josephg> actually, playback would be a fun thing to add to the godbox | |
14:28 < josephg> should be pretty easy to do, too. | |
14:29 < k1i> right now my main scare with Derby in general is the oplog growth issue (and validations, but thats another story) | |
14:29 < josephg> Brian is adding schema validation at the moment for our app | |
14:30 < josephg> sharejs exposes a validate function, so you can plug in your own schema validation / whatever logic in there | |
14:30 < josephg> but yeah, the oplog growth issue is important | |
14:30 < josephg> - and I want to solve that in the next few weeks in some form or other. | |
14:30 < josephg> we also don't have any decent benchmarks about how the whole system performs | |
14:31 < josephg> which is important for me - for example, if we move redis to have all the ops in their own key, how does that perform? | |
14:31 < josephg> (although redis being redis, probably still waaaay better than any of the javascript) | |
14:31 < k1i> also | |
14:32 < k1i> sorry | |
14:32 < k1i> this is very important | |
14:32 < k1i> the fact Racer doesn't support Projections/ShareJS not supporting Mongo projections is hugely problematic for me | |
14:32 < k1i> and I would expect most users | |
14:32 < k1i> I shouldn't have to define a User's password field in a separate collection just to get it away from public eyes | |
14:33 < josephg> yep.... I had this exact conversation on friday night with brian. | |
14:33 < josephg> he's strongly of the opinion that we should support collections, and I don't want to add more parts to sharejs | |
14:33 < k1i> again, it goes against conventional data modeling to not be able to do those kinds of operations | |
14:34 < k1i> no matter the datastore | |
14:34 < josephg> well, redis doesn't do projections | |
14:34 < k1i> PGSQL (row), Mongo (document) - fields need hidden | |
14:34 < josephg> but yeah - mongo and couch both do. | |
14:34 < k1i> enterprises don't use redis for persistent datastore, either though, generally | |
14:34 < josephg> true. nate and I have been talking about first adding filters | |
14:34 < k1i> and I personally wouldn't bank an entire framework on an edge case persistent datastore (redis) | |
14:35 < k1i> I saw that | |
14:35 < k1i> and it looked interesting | |
14:35 < josephg> yep - so thats probably what v1 will look like - | |
14:35 < k1i> filtering specific 'fields' from being operated on | |
14:35 < josephg> yep, and from being visible to a client. | |
14:35 < josephg> so a client will have a specific view of a document. For example, a user can see their entire own profile | |
14:35 < josephg> but only some fields of other user's profiles | |
14:36 < josephg> we'll need to edit operations going to that client, but if we do it right, the client won't be able to tell that there even are more fields in the | |
document | |
14:36 < k1i> yep | |
14:36 < k1i> that would be ideal | |
14:36 < josephg> thats the 'perminant projection' system | |
14:36 < k1i> this is something none of the realtime 'frameworks' that exist now have solved | |
14:36 < josephg> interesting. | |
14:37 < k1i> everyone can stop access to a specific document because a query can be built on it | |
14:37 < k1i> but app-level security on individual fields is absolutely imperative | |
14:37 < josephg> yep. | |
14:37 < k1i> if a Derby or Meteor are going to win over rails in 'framework choice' | |
14:37 < k1i> it's not even a passable option | |
14:37 < josephg> ... the other thing that would be nice to have is a way for queries to only return part of a document | |
14:37 < k1i> yes | |
14:37 < k1i> that would increase efficiency | |
14:38 < josephg> for example, if I'm viewing a list of documents, I probably only want a couple of fields | |
14:38 < k1i> it can spawn weird edge cases | |
14:38 < josephg> ... then if I click on one, I should see all the rest of the fields too | |
14:38 < josephg> it sure can. | |
14:38 < k1i> when certain fields are based on another | |
14:38 < josephg> so yeah, thats going to take some more thought. | |
14:38 < josephg> but we'll probably start with the filter thing - though for me its a lower priority than doing a bunch of benchmarks | |
14:38 < josephg> and solving the oplog issue | |
14:38 < k1i> well | |
14:39 < k1i> yeah | |
14:39 < k1i> some people can't even migrate to derby .5 due to a massive oplog | |
14:39 < josephg> yeah exactly. | |
14:39 < k1i> also, I am of the opinion that the oplog should be completely transient - | |
14:39 < k1i> if I go in and 'redis-flush' everything away | |
14:39 < k1i> that should be completely OK | |
14:39 < k1i> and the app should be able to handle any issues associated with that | |
14:39 < josephg> well, if we move the oplog out into something that mongo / whatever could provide | |
14:39 < josephg> then you could always just store it in something that sometimes forgets ops | |
14:39 < josephg> and we should make the system be able to deal iwth that too. | |
14:40 < k1i> yea | |
14:40 < k1i> I like redis | |
14:40 < k1i> but | |
14:40 < k1i> the memory thing is a bit tricky | |
14:40 * josephg nods | |
14:40 < josephg> koppor: are you still around? | |
14:40 < josephg> ... koppor was asking about socket.io | |
14:41 < k1i> yes | |
14:41 < k1i> id like to ask you about that as well | |
14:41 < k1i> what is the current issue with native websocket? | |
14:41 < josephg> I dunno if its gotten better since, but I hate socket.io because of all the grief it caused me while doing sharejs | |
14:41 < k1i> when was the last time you used it | |
14:41 < josephg> its just unreliable, it doesn't guarantee message ordering | |
14:41 < josephg> um, about 18 months ago | |
14:41 < k1i> can you try engine.io | |
14:42 < josephg> ... and it can tell you a client disconnected, then give you more ops for that client | |
14:42 < josephg> I dunno man - I don't trust it. | |
14:42 < k1i> https://github.com/LearnBoost/engine.io | |
14:42 < k1i> engine.io is heavily actively developed | |
14:42 * josephg shrugs | |
14:42 < josephg> does it order operations? | |
14:42 < josephg> ... anyway, the new architecture of sharejs means that you can use whatever you want. | |
14:42 < k1i> native websockets have a huge, huge advantage | |
14:43 < josephg> in performance, yeah | |
14:43 < k1i> in that they don't require a sticky-sessioning LB to maintain efficiency on the server-side | |
14:43 < k1i> much easier to scale | |
14:43 < k1i> obviously you will want one for fallback clients, but, still | |
14:44 < josephg> ... you don't? | |
14:44 < k1i> for native websockets? | |
14:44 < josephg> hm I guess not. | |
14:44 < k1i> the TCP connection is maintained by whatever LB you are running | |
14:44 < k1i> it's inherently 'sticky' as it's an open socket | |
14:44 < k1i> the LB can then round-robin, least-load, etc. any other connection | |
14:45 < josephg> right, but you aren't just doing request-response over the socket | |
14:45 < k1i> that's probably my favorite feature about websockets | |
14:45 < k1i> but the connection remains open throughout the duration of a clients' visit, though, right? | |
14:45 < josephg> you also need to be able to send to the client when one of the subscribed documents changes | |
14:45 < k1i> yes | |
14:45 < josephg> ... and to do that you need a server to be 'responsible' for the client anyway | |
14:45 < k1i> I am saying just at an LB-level | |
14:46 < josephg> hm - I guess you could have any server able to send to the client | |
14:46 < k1i> the LB has less-thought into maintaining a stateful websocket than stateless polling | |
14:46 < k1i> no, the client still gets talked to by their associated server | |
14:47 < k1i> if the client refreshes, they reconnect and setup a new copy of the redis-stored session on another backend server | |
14:48 < josephg> ... so which server sends a client ops for its subscriptions? | |
14:48 < k1i> the server that they are cnnected to via websocket | |
14:49 < k1i> initially | |
14:49 < josephg> oooooooh | |
14:49 < k1i> the websocket has no reason to ever close | |
14:49 < josephg> right, because the load balancer will send the websocket *somewhere* it doesn't matter where | |
14:49 < k1i> so the client has no reason to ever get connected to a different server | |
14:49 < k1i> yes | |
14:49 < josephg> and that server is responsible for that client foever. | |
14:49 < k1i> and it stays open | |
14:49 < josephg> yeah | |
14:49 < k1i> the LB never touches the websocket again after it's opened | |
14:49 < k1i> they know how to pass socketed traffic | |
14:49 < josephg> yep - its just that the load balancer doesn't hav eto know. It just pipes | |
14:49 < k1i> now | |
14:49 < k1i> sticky-sessioning is something you want for efficiency and fallback clients | |
14:50 < josephg> yeah - lovely. | |
14:50 < k1i> but, it makes LB a lot easier in high-scalability environments | |
14:50 < k1i> to be able to roundrobin, etc. | |
14:50 < k1i> also | |
14:50 < k1i> LB's such as HAProxy will eventually have bindings written for them, for derby, etc. | |
14:50 < k1i> to be able to contact them for client count | |
14:51 < josephg> so in sharejs, because I got sick of people filing bugs about socket.io being broken, etc | |
14:51 < josephg> I've moved to a system where the user is responsible for making the server-client connection | |
14:52 < josephg> on the server, you pass sharejs a node 0.10 stream which it can use to talk to a client that just connected | |
14:52 < k1i> yeah, that's probably good for node-like compatibility and abstraction | |
14:52 < josephg> and on the client, you pass a websocket-like object which it'll use to talk to the server | |
14:52 < k1i> I personally have total control over my clients | |
14:52 < josephg> and then you can send sidechannel messages in the stream, etc. | |
14:52 < josephg> yep | |
14:52 < k1i> and will be forcing them all to be websocket-enabled browsers | |
14:52 < josephg> ... yeah, so then you can use websockets | |
14:53 < josephg> there's probably a couple issues you'll run into at the moment because I think I"m taking advantage of the fact that browserchannel lets you send | |
messages while its connecting | |
14:53 < josephg> - but let me know and I can fix them, or you can fix them. | |
14:53 < josephg> but it should work. | |
14:53 < josephg> thats the idea | |
14:54 < josephg> there's racer-browserchannel kicking around somewhere taht has the 2 files or whatever which does the work | |
14:54 < josephg> so yeah, go ahead and make a racer-websocket or whatever | |
14:54 < josephg> and slot it in. | |
14:55 < k1i> yeah | |
14:55 < josephg> and koppor: likewise, use socket.io if you want. But if the server crashes because messages arrive out of order, its not my bug. | |
14:55 < k1i> I think engine.io allows queued messages | |
14:55 < josephg> if all your browsers support websocket, why bother? | |
14:56 < josephg> websocket over https works great (better than websocket over http because proxies don't get in the way) | |
14:56 < k1i> node-browserchannel doesn't use websockets? | |
14:56 < josephg> nope. I wanted to add it, but I'd need to add websocket support to the closure library | |
14:56 < k1i> yeah | |
14:56 < josephg> it was on my nice-to-have list and less important than adding cursors to sharejs |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I know this is old .. but could you clean-up old oplog entries by storing them in an 'archive' store (still want to maintain the full audit trail) and then storing hashes of changes or states aggregated to a full day in Redis? So the client would attempt to hash their local data according to the same aggregation rules, compare the full-day hashes with the full-day hashes in Redis to determine when the timeline broke, then requested the transactions for those full-days to catch up to current, then read the live log from Redis?