raggi · November 3, 2010 02:05
diff --git a/roflscale.txt b/roflscale.txt
 08:58 Defi_: can anyone tell me if there are any obvious disadvantages to patching a blocking library to use fibers at the socket level?
 08:58 Defi_: its far too much effort to have to rewrite large chunks of every library just to make it async and fiber-aware
 08:59 raggi: if it uses non-stack stored state you could end up with concurrency issues (read: race conditions)
 08:59 raggi: fibers as implemented in MRI have limited size stacks (4kb)
 09:00 Defi_: hmm alright :/
 09:01 raggi: if the library is trying to be thread safe, it can make a real mess too
 09:01 raggi: Defi_: just use threads.
 09:02 Defi_: raggi: not gonna happen...
 09:02 Defi_: even if this was a personal project, i wouldnt use threads
 09:03 Defi_: guess ill just have to continue rewriting parts of a bunch of libs as i go
 09:03 raggi: oh, wait, i remember who you are
 09:03 raggi: you're the omniscient guy who needs roflscale aren't you, and you think threads are going to stop you from servicing your roflmillions of users
 09:03 Defi_: and i remember that you hate fibers for whatever reasons
 09:04 raggi: i don't hate fibers
 09:04 raggi: i just think a lot of people are misusing them for silly things
 09:05 Defi_: no.. but my boss does want a highly scalable async platform backend, and from experience, i'd rather wrap the async code in fibers, than have tons of callback spaghetti
 09:05 raggi: fibers are spaghetti
 09:06 raggi: you're still doing regular jumps around code
 09:06 Defi_: im sure plenty people misuse them for plenty reasons, but no more than many other things
 09:06 raggi: it's just that you have it stored in stacks instead of objects
 09:06 Defi_: so what?
 09:06 raggi: which is harder to debug
 09:06 Defi_: so you'd rather use tons of callbacks, than fibers?
 09:07 raggi: i'd be pragmatic about what the critical path is
 09:07 raggi: and optimise only the critical path
 09:08 Defi_: heh
 09:09 raggi: Defi_: what "scale" are you really talking about, because last time you just started blurting the word cloud at me like it meant something special
 09:10 Defi_: raggi: im talking reasonably large scale, obviously not right from the beginning, but as functionality and the client base grows, it needs to scale up majorly, with interfaces to pretty much every large social/media related sites apis on the web
 09:11 raggi: talk numbers
 09:12 Defi_: raggi: i cant talk numbers, the platform is still in early development stages, but it needs to be able to make many hundreds of requests in realtime
 09:12 raggi: you don't know what the numbers are, and yet you're saying "hundreds of requests in realtime"
 09:12 raggi: FYI, hundreds of requests "in realtime" can work just fine using sync libs
 09:13 Defi_: its going to have to either run on its own netblock of ips or use many proxies
 09:13 Defi_: ok, lets say thousands of requests in realtime per online client
 09:13 Defi_: that should give you a rough idea
 09:14 Defi_: its too early in development to be able to be too specific
 09:14 raggi: that just means you are doing no real projections, which means it's purely technological masturbation
 09:15 raggi: as for thousands of requests per client "in realtime" that's more than likely a completely silly sentiment
 09:15 raggi: for the plain and simple reason, that a particular user is unlikely to even want to read all the results of thousands of requests each time they visit
 09:15 raggi: and if you're aggregating, then you probably arent' going to be triggering based on "each online client"
 09:16 Defi_: again, im not the boss, i just write the code required to make shit happen
 09:16 Defi_: but the data fetched over hundreds of thousands of requests will be processed to averages and summed statistics
 09:17 raggi: you should tell your boss to pay for some consulting from someone that's built these kinds of systems before
 09:18 Defi_: anyway, discussing this doesnt really help anything
 09:18 raggi: no, because you're already set in stone that you need roflscale, which is both architecturally wrong and invalid for the business
 09:19 raggi: you also seem to be completely certain that threads are completely inappropriate for your use cases
 09:19 Defi_: heh, you really dont have any idea of the specifics of the system, so you cannot judge what sort of scaling is required
 09:19 raggi: and you clearly don't understand the non-difference between threads and fibers in this kind of context
 09:20 Defi_: i understand that threads have more overhead than fibers and any shared data would require locking and synching
 09:20 raggi: Defi_: actually, i can, because i've worked on systems that are in these categories
 09:20 Defi_: i also understand all the over disadvantaged of threads
 09:20 raggi: Defi_: you still need locking in fibers
 09:20 Defi_: nope
 09:20 raggi: Defi_: for shared state
 09:20 raggi: yes you do
 09:20 raggi: lol
 09:20 Defi_: fibers do not run concurrently
 09:20 xxxxxx: lawl
 09:20 raggi: yes they do
 09:20 raggi: they don't run in parallel
 09:20 raggi: but that's different
 09:20 raggi: and they're cooperatively scheduled
 09:21 Defi_: by concurrently, i mean in parallel
 09:21 Defi_: i know this...
 09:21 raggi: hey xxxxxx :)
 09:21 xxxxxx: lawls
 09:21 xxxxxx: hi
 09:21 xxxxxx: this is fun
 09:21 raggi: yes
 09:21 raggi: roflscale
 09:21 Defi_: you do not need to lock an array, if 2 fibers use it
 09:21 raggi: Defi_: i think you need shards bro
 09:21 Silex: imho concurrently should mean in parralel by default
 09:21 Defi_: because they will never access it at the same time
 09:21 Defi_: since its cooperatively scheduled
 09:21 Defi_: i know its gonna need to be sharded raggi
 09:22 Defi_: which is one of the reasons i've gone with MongoDB
 09:22 xxxxxx: shard your fibers
 09:22 raggi: Defi_: http://gist.github.com/560087
 09:22 raggi: shardnull
 09:22 raggi: it's faster than mongodb
 09:22 raggi: and more reliable
 09:22 raggi: you can be 100% certain of what it will do with your data
 09:23 Defi_: it is?
 09:23 raggi: it's also atomically consistent even under high concurrency, and 100% available
 09:23 raggi: iow it completely defeats the CAP theorom
 09:23 locks: raggi: are you certain? mongodb is web scale..
 09:23 raggi: locks: shardnull is UNIVERSE SCALE
 09:24 Defi_: mongo scales very nicely, is it really worth using shardnull?
 09:24 raggi: yes
 09:24 Defi_: is there a nice ORM for it like MongoMapper?
 09:24 locks: definitely
 09:24 xxxxxx: wtf defi_ are you actually trying to re-enact that webscale video ?
 09:24 Defi_: because i've already rewritten some of the models for this platform twice
 09:24 raggi: yes, it can store any objects directly
 09:24 Defi_: first with mongoid, then mongomapper
 09:24 raggi: so you don't need an ORM, you can just send it pure marshalled ruby
 09:25 raggi: and it consistently atomically stores your objects in parallel in a highly available distributed manner
 09:25 locks: and it doesn't lock ever
 09:25 raggi: all it does is apply some very "clever" mutations to the data on the way into the store
 09:26 Defi_: hmm
 09:26 raggi: also, it's only 28 lines of ruby that i wrote using FFI to link libc
 09:26 xxxxxx: run it by your boss
 09:26 raggi: so it's obviously FAST
 09:26 Defi_: and how can the data be queried by the database raggi?
 09:26 xxxxxx: its also pretty damn memory efficient
 09:26 raggi: `less /dev/null`
 09:26 Defi_: how can it process through millions of records
 09:26 raggi: standard posix operations
 09:26 Defi_: are you sure it'll perform as well as mongodb?
 09:27 raggi: Defi_: it'll process millions of writes per second with ease
 09:27 locks: raggi: does it work on windows though?
 09:27 raggi: it performs better than mongodb
 09:27 raggi: locks: my shardnull proxies do
 09:27 Defi_: raggi: how does that help with summing up millions of records into whatever data
 09:27 raggi: locks: that's actually shardNUL:
 09:27 raggi: ;-)
 09:27 locks: ohhh
 09:27 locks: you're pretty clever man
 09:28 raggi: i have the sekrets of the webscale sauce
 09:28 Defi_: pulling millions of rows of data into ruby and processing surely isnt nearly as efficient as mongodb raggi?
 09:28 raggi: Defi_: you can aggregate teh size of the records just by reading stats from /proc
 09:28 raggi: ZOMGLOL
 09:28 raggi: i'm dieing here
 09:29 Defi_: eh.. im failing to see how you would process the data more efficient than with a mongodb query
 09:29 raggi: Defi_: no, you can use /proc, so the kernels already done all the work for you! (which is written in C and ASM)
 09:29 raggi: i've gotta win some kind of troll-the-troll award, surely?
 09:30 xxxxxx: lol
	08:58 Defi_: can anyone tell me if there are any obvious disadvantages to patching a blocking library to use fibers at the socket level?
	08:58 Defi_: its far too much effort to have to rewrite large chunks of every library just to make it async and fiber-aware
	08:59 raggi: if it uses non-stack stored state you could end up with concurrency issues (read: race conditions)
	08:59 raggi: fibers as implemented in MRI have limited size stacks (4kb)
	09:00 Defi_: hmm alright :/
	09:01 raggi: if the library is trying to be thread safe, it can make a real mess too
	09:01 raggi: Defi_: just use threads.
	09:02 Defi_: raggi: not gonna happen...
	09:02 Defi_: even if this was a personal project, i wouldnt use threads
	09:03 Defi_: guess ill just have to continue rewriting parts of a bunch of libs as i go
	09:03 raggi: oh, wait, i remember who you are
	09:03 raggi: you're the omniscient guy who needs roflscale aren't you, and you think threads are going to stop you from servicing your roflmillions of users
	09:03 Defi_: and i remember that you hate fibers for whatever reasons
	09:04 raggi: i don't hate fibers
	09:04 raggi: i just think a lot of people are misusing them for silly things
	09:05 Defi_: no.. but my boss does want a highly scalable async platform backend, and from experience, i'd rather wrap the async code in fibers, than have tons of callback spaghetti
	09:05 raggi: fibers are spaghetti
	09:06 raggi: you're still doing regular jumps around code
	09:06 Defi_: im sure plenty people misuse them for plenty reasons, but no more than many other things
	09:06 raggi: it's just that you have it stored in stacks instead of objects
	09:06 Defi_: so what?
	09:06 raggi: which is harder to debug
	09:06 Defi_: so you'd rather use tons of callbacks, than fibers?
	09:07 raggi: i'd be pragmatic about what the critical path is
	09:07 raggi: and optimise only the critical path
	09:08 Defi_: heh
	09:09 raggi: Defi_: what "scale" are you really talking about, because last time you just started blurting the word cloud at me like it meant something special
	09:10 Defi_: raggi: im talking reasonably large scale, obviously not right from the beginning, but as functionality and the client base grows, it needs to scale up majorly, with interfaces to pretty much every large social/media related sites apis on the web
	09:11 raggi: talk numbers
	09:12 Defi_: raggi: i cant talk numbers, the platform is still in early development stages, but it needs to be able to make many hundreds of requests in realtime
	09:12 raggi: you don't know what the numbers are, and yet you're saying "hundreds of requests in realtime"
	09:12 raggi: FYI, hundreds of requests "in realtime" can work just fine using sync libs
	09:13 Defi_: its going to have to either run on its own netblock of ips or use many proxies
	09:13 Defi_: ok, lets say thousands of requests in realtime per online client
	09:13 Defi_: that should give you a rough idea
	09:14 Defi_: its too early in development to be able to be too specific
	09:14 raggi: that just means you are doing no real projections, which means it's purely technological masturbation
	09:15 raggi: as for thousands of requests per client "in realtime" that's more than likely a completely silly sentiment
	09:15 raggi: for the plain and simple reason, that a particular user is unlikely to even want to read all the results of thousands of requests each time they visit
	09:15 raggi: and if you're aggregating, then you probably arent' going to be triggering based on "each online client"
	09:16 Defi_: again, im not the boss, i just write the code required to make shit happen
	09:16 Defi_: but the data fetched over hundreds of thousands of requests will be processed to averages and summed statistics
	09:17 raggi: you should tell your boss to pay for some consulting from someone that's built these kinds of systems before
	09:18 Defi_: anyway, discussing this doesnt really help anything
	09:18 raggi: no, because you're already set in stone that you need roflscale, which is both architecturally wrong and invalid for the business
	09:19 raggi: you also seem to be completely certain that threads are completely inappropriate for your use cases
	09:19 Defi_: heh, you really dont have any idea of the specifics of the system, so you cannot judge what sort of scaling is required
	09:19 raggi: and you clearly don't understand the non-difference between threads and fibers in this kind of context
	09:20 Defi_: i understand that threads have more overhead than fibers and any shared data would require locking and synching
	09:20 raggi: Defi_: actually, i can, because i've worked on systems that are in these categories
	09:20 Defi_: i also understand all the over disadvantaged of threads
	09:20 raggi: Defi_: you still need locking in fibers
	09:20 Defi_: nope
	09:20 raggi: Defi_: for shared state
	09:20 raggi: yes you do
	09:20 raggi: lol
	09:20 Defi_: fibers do not run concurrently
	09:20 xxxxxx: lawl
	09:20 raggi: yes they do
	09:20 raggi: they don't run in parallel
	09:20 raggi: but that's different
	09:20 raggi: and they're cooperatively scheduled
	09:21 Defi_: by concurrently, i mean in parallel
	09:21 Defi_: i know this...
	09:21 raggi: hey xxxxxx :)
	09:21 xxxxxx: lawls
	09:21 xxxxxx: hi
	09:21 xxxxxx: this is fun
	09:21 raggi: yes
	09:21 raggi: roflscale
	09:21 Defi_: you do not need to lock an array, if 2 fibers use it
	09:21 raggi: Defi_: i think you need shards bro
	09:21 Silex: imho concurrently should mean in parralel by default
	09:21 Defi_: because they will never access it at the same time
	09:21 Defi_: since its cooperatively scheduled
	09:21 Defi_: i know its gonna need to be sharded raggi
	09:22 Defi_: which is one of the reasons i've gone with MongoDB
	09:22 xxxxxx: shard your fibers
	09:22 raggi: Defi_: http://gist.github.com/560087
	09:22 raggi: shardnull
	09:22 raggi: it's faster than mongodb
	09:22 raggi: and more reliable
	09:22 raggi: you can be 100% certain of what it will do with your data
	09:23 Defi_: it is?
	09:23 raggi: it's also atomically consistent even under high concurrency, and 100% available
	09:23 raggi: iow it completely defeats the CAP theorom
	09:23 locks: raggi: are you certain? mongodb is web scale..
	09:23 raggi: locks: shardnull is UNIVERSE SCALE
	09:24 Defi_: mongo scales very nicely, is it really worth using shardnull?
	09:24 raggi: yes
	09:24 Defi_: is there a nice ORM for it like MongoMapper?
	09:24 locks: definitely
	09:24 xxxxxx: wtf defi_ are you actually trying to re-enact that webscale video ?
	09:24 Defi_: because i've already rewritten some of the models for this platform twice
	09:24 raggi: yes, it can store any objects directly
	09:24 Defi_: first with mongoid, then mongomapper
	09:24 raggi: so you don't need an ORM, you can just send it pure marshalled ruby
	09:25 raggi: and it consistently atomically stores your objects in parallel in a highly available distributed manner
	09:25 locks: and it doesn't lock ever
	09:25 raggi: all it does is apply some very "clever" mutations to the data on the way into the store
	09:26 Defi_: hmm
	09:26 raggi: also, it's only 28 lines of ruby that i wrote using FFI to link libc
	09:26 xxxxxx: run it by your boss
	09:26 raggi: so it's obviously FAST
	09:26 Defi_: and how can the data be queried by the database raggi?
	09:26 xxxxxx: its also pretty damn memory efficient
	09:26 raggi: `less /dev/null`
	09:26 Defi_: how can it process through millions of records
	09:26 raggi: standard posix operations
	09:26 Defi_: are you sure it'll perform as well as mongodb?
	09:27 raggi: Defi_: it'll process millions of writes per second with ease
	09:27 locks: raggi: does it work on windows though?
	09:27 raggi: it performs better than mongodb
	09:27 raggi: locks: my shardnull proxies do
	09:27 Defi_: raggi: how does that help with summing up millions of records into whatever data
	09:27 raggi: locks: that's actually shardNUL:
	09:27 raggi: ;-)
	09:27 locks: ohhh
	09:27 locks: you're pretty clever man
	09:28 raggi: i have the sekrets of the webscale sauce
	09:28 Defi_: pulling millions of rows of data into ruby and processing surely isnt nearly as efficient as mongodb raggi?
	09:28 raggi: Defi_: you can aggregate teh size of the records just by reading stats from /proc
	09:28 raggi: ZOMGLOL
	09:28 raggi: i'm dieing here
	09:29 Defi_: eh.. im failing to see how you would process the data more efficient than with a mongodb query
	09:29 raggi: Defi_: no, you can use /proc, so the kernels already done all the work for you! (which is written in C and ASM)
	09:29 raggi: i've gotta win some kind of troll-the-troll award, surely?
	09:30 xxxxxx: lol