Created
November 3, 2010 02:05
-
-
Save raggi/660696 to your computer and use it in GitHub Desktop.
sekrets of the roflscale sauce
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
08:58 Defi_: can anyone tell me if there are any obvious disadvantages to patching a blocking library to use fibers at the socket level? | |
08:58 Defi_: its far too much effort to have to rewrite large chunks of every library just to make it async and fiber-aware | |
08:59 raggi: if it uses non-stack stored state you could end up with concurrency issues (read: race conditions) | |
08:59 raggi: fibers as implemented in MRI have limited size stacks (4kb) | |
09:00 Defi_: hmm alright :/ | |
09:01 raggi: if the library is trying to be thread safe, it can make a real mess too | |
09:01 raggi: Defi_: just use threads. | |
09:02 Defi_: raggi: not gonna happen... | |
09:02 Defi_: even if this was a personal project, i wouldnt use threads | |
09:03 Defi_: guess ill just have to continue rewriting parts of a bunch of libs as i go | |
09:03 raggi: oh, wait, i remember who you are | |
09:03 raggi: you're the omniscient guy who needs roflscale aren't you, and you think threads are going to stop you from servicing your roflmillions of users | |
09:03 Defi_: and i remember that you hate fibers for whatever reasons | |
09:04 raggi: i don't hate fibers | |
09:04 raggi: i just think a lot of people are misusing them for silly things | |
09:05 Defi_: no.. but my boss does want a highly scalable async platform backend, and from experience, i'd rather wrap the async code in fibers, than have tons of callback spaghetti | |
09:05 raggi: fibers are spaghetti | |
09:06 raggi: you're still doing regular jumps around code | |
09:06 Defi_: im sure plenty people misuse them for plenty reasons, but no more than many other things | |
09:06 raggi: it's just that you have it stored in stacks instead of objects | |
09:06 Defi_: so what? | |
09:06 raggi: which is harder to debug | |
09:06 Defi_: so you'd rather use tons of callbacks, than fibers? | |
09:07 raggi: i'd be pragmatic about what the critical path is | |
09:07 raggi: and optimise only the critical path | |
09:08 Defi_: heh | |
09:09 raggi: Defi_: what "scale" are you really talking about, because last time you just started blurting the word cloud at me like it meant something special | |
09:10 Defi_: raggi: im talking reasonably large scale, obviously not right from the beginning, but as functionality and the client base grows, it needs to scale up majorly, with interfaces to pretty much every large social/media related sites apis on the web | |
09:11 raggi: talk numbers | |
09:12 Defi_: raggi: i cant talk numbers, the platform is still in early development stages, but it needs to be able to make many hundreds of requests in realtime | |
09:12 raggi: you don't know what the numbers are, and yet you're saying "hundreds of requests in realtime" | |
09:12 raggi: FYI, hundreds of requests "in realtime" can work just fine using sync libs | |
09:13 Defi_: its going to have to either run on its own netblock of ips or use many proxies | |
09:13 Defi_: ok, lets say thousands of requests in realtime per online client | |
09:13 Defi_: that should give you a rough idea | |
09:14 Defi_: its too early in development to be able to be too specific | |
09:14 raggi: that just means you are doing no real projections, which means it's purely technological masturbation | |
09:15 raggi: as for thousands of requests per client "in realtime" that's more than likely a completely silly sentiment | |
09:15 raggi: for the plain and simple reason, that a particular user is unlikely to even want to read all the results of thousands of requests each time they visit | |
09:15 raggi: and if you're aggregating, then you probably arent' going to be triggering based on "each online client" | |
09:16 Defi_: again, im not the boss, i just write the code required to make shit happen | |
09:16 Defi_: but the data fetched over hundreds of thousands of requests will be processed to averages and summed statistics | |
09:17 raggi: you should tell your boss to pay for some consulting from someone that's built these kinds of systems before | |
09:18 Defi_: anyway, discussing this doesnt really help anything | |
09:18 raggi: no, because you're already set in stone that you need roflscale, which is both architecturally wrong and invalid for the business | |
09:19 raggi: you also seem to be completely certain that threads are completely inappropriate for your use cases | |
09:19 Defi_: heh, you really dont have any idea of the specifics of the system, so you cannot judge what sort of scaling is required | |
09:19 raggi: and you clearly don't understand the non-difference between threads and fibers in this kind of context | |
09:20 Defi_: i understand that threads have more overhead than fibers and any shared data would require locking and synching | |
09:20 raggi: Defi_: actually, i can, because i've worked on systems that are in these categories | |
09:20 Defi_: i also understand all the over disadvantaged of threads | |
09:20 raggi: Defi_: you still need locking in fibers | |
09:20 Defi_: nope | |
09:20 raggi: Defi_: for shared state | |
09:20 raggi: yes you do | |
09:20 raggi: lol | |
09:20 Defi_: fibers do not run concurrently | |
09:20 xxxxxx: lawl | |
09:20 raggi: yes they do | |
09:20 raggi: they don't run in parallel | |
09:20 raggi: but that's different | |
09:20 raggi: and they're cooperatively scheduled | |
09:21 Defi_: by concurrently, i mean in parallel | |
09:21 Defi_: i know this... | |
09:21 raggi: hey xxxxxx :) | |
09:21 xxxxxx: lawls | |
09:21 xxxxxx: hi | |
09:21 xxxxxx: this is fun | |
09:21 raggi: yes | |
09:21 raggi: roflscale | |
09:21 Defi_: you do not need to lock an array, if 2 fibers use it | |
09:21 raggi: Defi_: i think you need shards bro | |
09:21 Silex: imho concurrently should mean in parralel by default | |
09:21 Defi_: because they will never access it at the same time | |
09:21 Defi_: since its cooperatively scheduled | |
09:21 Defi_: i know its gonna need to be sharded raggi | |
09:22 Defi_: which is one of the reasons i've gone with MongoDB | |
09:22 xxxxxx: shard your fibers | |
09:22 raggi: Defi_: http://gist.github.com/560087 | |
09:22 raggi: shardnull | |
09:22 raggi: it's faster than mongodb | |
09:22 raggi: and more reliable | |
09:22 raggi: you can be 100% certain of what it will do with your data | |
09:23 Defi_: it is? | |
09:23 raggi: it's also atomically consistent even under high concurrency, and 100% available | |
09:23 raggi: iow it completely defeats the CAP theorom | |
09:23 locks: raggi: are you certain? mongodb is web scale.. | |
09:23 raggi: locks: shardnull is UNIVERSE SCALE | |
09:24 Defi_: mongo scales very nicely, is it really worth using shardnull? | |
09:24 raggi: yes | |
09:24 Defi_: is there a nice ORM for it like MongoMapper? | |
09:24 locks: definitely | |
09:24 xxxxxx: wtf defi_ are you actually trying to re-enact that webscale video ? | |
09:24 Defi_: because i've already rewritten some of the models for this platform twice | |
09:24 raggi: yes, it can store any objects directly | |
09:24 Defi_: first with mongoid, then mongomapper | |
09:24 raggi: so you don't need an ORM, you can just send it pure marshalled ruby | |
09:25 raggi: and it consistently atomically stores your objects in parallel in a highly available distributed manner | |
09:25 locks: and it doesn't lock ever | |
09:25 raggi: all it does is apply some very "clever" mutations to the data on the way into the store | |
09:26 Defi_: hmm | |
09:26 raggi: also, it's only 28 lines of ruby that i wrote using FFI to link libc | |
09:26 xxxxxx: run it by your boss | |
09:26 raggi: so it's obviously FAST | |
09:26 Defi_: and how can the data be queried by the database raggi? | |
09:26 xxxxxx: its also pretty damn memory efficient | |
09:26 raggi: `less /dev/null` | |
09:26 Defi_: how can it process through millions of records | |
09:26 raggi: standard posix operations | |
09:26 Defi_: are you sure it'll perform as well as mongodb? | |
09:27 raggi: Defi_: it'll process millions of writes per second with ease | |
09:27 locks: raggi: does it work on windows though? | |
09:27 raggi: it performs better than mongodb | |
09:27 raggi: locks: my shardnull proxies do | |
09:27 Defi_: raggi: how does that help with summing up millions of records into whatever data | |
09:27 raggi: locks: that's actually shardNUL: | |
09:27 raggi: ;-) | |
09:27 locks: ohhh | |
09:27 locks: you're pretty clever man | |
09:28 raggi: i have the sekrets of the webscale sauce | |
09:28 Defi_: pulling millions of rows of data into ruby and processing surely isnt nearly as efficient as mongodb raggi? | |
09:28 raggi: Defi_: you can aggregate teh size of the records just by reading stats from /proc | |
09:28 raggi: ZOMGLOL | |
09:28 raggi: i'm dieing here | |
09:29 Defi_: eh.. im failing to see how you would process the data more efficient than with a mongodb query | |
09:29 raggi: Defi_: no, you can use /proc, so the kernels already done all the work for you! (which is written in C and ASM) | |
09:29 raggi: i've gotta win some kind of troll-the-troll award, surely? | |
09:30 xxxxxx: lol |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment