awreece · June 4, 2013 18:27
diff --git a/#memcached b/#memcached
 [08:43:33] <awreece> what is is a good benchmark for memcached?
 [08:44:27] <awreece> https://github.com/antirez/mc-benchmark appears to be several years old, and last I checked there was some disagreement about it
 [08:44:55] <awreece> in particular, I want to experiment with a new storage engine
 [11:02:50] <dormando> awreece: github.com/dormando/mc-crusher
 [11:05:28] <awreece> cool. I'll try this
 [11:05:57] <awreece> does it make any attempt to simulate realistic application workloads, or is the general wisdom "every workload is different" ?
 [11:06:43] <awreece> (I notice "it only works well with small values")
 [11:08:08] <dormando> most workloads are different. mc-crusher lets you set up config files for different types
 [11:08:23] <dormando> it's mostly designed to stress the storage engine locks/etc. find the breaking points.
 [11:08:48] <awreece> ok
 [11:09:00] <awreece> for reference, I'm looking into integrating the work from this paper https://www.cs.cmu.edu/~dga/papers/memc3-nsdi2013.pdf
 [11:09:25] <awreece> hopefully as an alternate engine
 [11:09:26] <dho> ha
 [11:10:24] <dormando> awreece: I think I remember that paper... IIRC some part of it says "writes aren't really that important"
 [11:10:33] <dormando> so it wasn't super interesting to me. it's a clever idea though
 [11:10:33] <awreece> I don't know if the community has discussed it or not earlier, my impression from talking to the author was that the code wasn't actually prepared to contribute back
 [11:10:42] <dormando> yeah it never is
 [11:10:46] <awreece> IIRC it said "resizes don't happen"
 [11:10:48] <dormando> people write papers like that all the time
 [11:10:54] <dormando> resizes was another thing
 [11:11:09] <dormando> but it also said theoretical write speed is low
 [11:11:10] <awreece> my goal is to actually prototype it
 [11:11:16] <dormando> unless I'm confusing that with one of the other 20 papers
 [11:11:17] <dho> dormando: yes, that's the read-heavy workload paper
 [11:11:17] <awreece> really?
 [11:11:34] <dormando> yeah it's sorta buried in there
 [11:11:39] <dho> dormando: the one I was showing you a few months ago
 [11:12:17] <dormando> some of them don't work with slab rebalance too
 [11:12:32] <awreece> I've talked to the paper author and have his blessing to work on it 
 [11:12:40] <dormando> cool. good luck
 [11:12:57] <awreece> when I get farther, should I also email the mailing list?
 [11:12:57] <dho> it falls over for write-heavy workloads
 [11:13:01] <dormando> yeah
 [11:13:05] <awreece> ok
 [11:13:26] <dho> I would say it would be very nice if those algorithms could be used dynamically.
 [11:13:36] <dho> or at least swapped out.
 [11:13:37] <awreece> dho: perhaps I didn't read the paper that carefully, but I never got the impression it was slower on writes existing solutions
 [11:13:49] <dho> awreece: it's definitely in there, sec
 [11:13:52] <awreece> dho: well, I was planning to make a memcached engine
 [11:14:11] <awreece> so it should theoretically be easy to swap out
 [11:14:59] <dho> "When
 [11:14:59] <dho> more write trac is generated, performance of cuckoo
 [11:14:59] <dho> hash tables declines because Inserts are serialized and
 [11:15:00] <dho> more Lookups happen concurrently"
 [11:15:09] <dho> additionally, it serializes all writes
 [11:15:18] <dormando> so does the current one :P
 [11:15:43] <awreece> it allows
 [11:15:43] <awreece> only one writer at a time—a tradeo
                                                        we accept as our target workloads are read-heavy.
 [11:15:56] <awreece> yeah
 [11:16:07] <awreece> my impression of the current code was that it serialized writers
 [11:16:16] <dormando> it is, but the lock is really short
 [11:16:21] <dho> anyhoo. just some thoughts. i don't actually use memcached, i just sit in here to torment dormando
 [11:16:21] <dormando> so something complex + serialized will lose to it
 [11:16:22] <awreece> yeah, this could be held for a while
 [11:16:29] <dho> and watch him yell at people
 [11:16:31] <dormando> dho: shush
 [11:16:31] <awreece> also, hi dho
 [11:16:37] <dho> also hi
 [11:16:38] <awreece> I mostly see you in #go-nuts
 [11:16:49] <dho> yarp
 [11:16:53] <dho> i work with dormando
 [11:16:58] <dho> so obligatory hard time
 [11:16:59] <dormando> awreece: feel free to engineer whatevs. just be aware that it's hard for me to "swap in the default" engine despite dozens of people screaming at me to do so :P
 [11:17:29] <awreece> do we have real benchmarks to be able to say stuff like "our target workloads are read heavy' or whatnot?
 [11:17:41] <dormando> I'm not sure what you mean
 [11:17:42] <dho> you can configure mccrusher to effectively make those claims
 [11:17:57] <dormando> well.. yeah.
 [11:18:00] <dho> if that's what you're asking
 [11:18:02] <awreece> well, I don't use memcached in production, but I'm sure there exist peopel that do :P
 [11:18:07] <awreece> the peopel who would complain
 [11:18:09] <dormando> with mc-crusher you can just run some readers and some writers
 [11:18:10] <awreece> do they have numbers?
 [11:18:15] <dormando> I don't know
 [11:18:18] <awreece> traces?
 [11:18:22] <awreece> ok, thats fair
 [11:18:29] <dho> there are people who have write-heavy workloads
 [11:18:36] <dormando> some people do massive imports when memcached starts
 [11:18:39] <dormando> or daily imports and stuff
 [11:18:40] <awreece> my first question is absolutely "what do workloads look like?"
 [11:18:45] <awreece> hrm
 [11:18:48] <dormando> workloads look like everything and all things under teh sun
 [11:18:51] <dho> there are people who have 10% hit rates
 [11:19:01] <dho> but it's still a win, because it's still better than 0%
 [11:19:03] <awreece> ... why are they using memcached
 [11:19:05] <dormando> so I only ever improve the performance of all edge cases
 [11:19:06] <awreece> ha, ok
 [11:19:08] <dormando> I never reduce it
 [11:19:19] <dormando> ie; I can't increase memory usage
 [11:19:23] <dormando> and I can't slow down writes in favor of reads
 [11:19:29] <dormando> I only just chip away at making them scale better
 [11:19:38] <awreece> mreh
 [11:19:47] <dormando> that's what alternative storage engines were supposed tos olve
 [11:19:55] <dormando> but we've had issues getting people to test that branch
 [11:20:11] <awreece> it took me a while to find the engine branch
 [11:20:22] <dho> that's a good approach ;)
 [11:20:23] <dormando> yeah. the stupid engine branch is behind 1.4 for perf patches
 [11:20:30] <dormando> so bench 1.4
 [11:20:34] <dormando> then write your engine into 1.5 or something
 [11:20:52] <dormando> if you're looking to be sure that "write speed below this level is fine" you'll never find that
 [11:21:00] <awreece> could you repeat that one more time?
 [11:21:02] <dormando> there're tons of companies who use the thing who'll never post anything anywhere ever
 [11:21:14] <awreece> you want me to submit code against 1.5?
 [11:21:14] <dormando> awreece: the engine branch is behind the stable tree
 [11:21:17] <dormando> it is slower
 [11:21:19] <awreece> yeah, I saw that
 [11:21:21] <awreece> oh, really
 [11:21:30] <dormando> yes
 [11:21:32] <awreece> ... why?
 [11:21:44] <dormando> because every time I beg for help testing the engine branch nobody does it
 [11:21:50] <dormando> so I had to keep developing on -stable
 [11:21:57] <dormando> because this project is cursed and I hate it
 [11:22:01] <awreece> :(
 [11:22:23] <dormando> the engine tree is fine, I just never got enough attention to it to line it up with memcached itself
 [11:22:32] <dormando> so the engine tree appears in mysql, couchbase, etc
 [11:22:41] <dormando> with storage engines which ahve different requirements and expectations
 [11:23:05] <dormando> anyway I gotta run
 [11:23:06] <dormando> have fun
 [11:23:09] <awreece> thanks
	[08:43:33] <awreece> what is is a good benchmark for memcached?
	[08:44:27] <awreece> https://github.com/antirez/mc-benchmark appears to be several years old, and last I checked there was some disagreement about it
	[08:44:55] <awreece> in particular, I want to experiment with a new storage engine
	[11:02:50] <dormando> awreece: github.com/dormando/mc-crusher
	[11:05:28] <awreece> cool. I'll try this
	[11:05:57] <awreece> does it make any attempt to simulate realistic application workloads, or is the general wisdom "every workload is different" ?
	[11:06:43] <awreece> (I notice "it only works well with small values")
	[11:08:08] <dormando> most workloads are different. mc-crusher lets you set up config files for different types
	[11:08:23] <dormando> it's mostly designed to stress the storage engine locks/etc. find the breaking points.
	[11:08:48] <awreece> ok
	[11:09:00] <awreece> for reference, I'm looking into integrating the work from this paper https://www.cs.cmu.edu/~dga/papers/memc3-nsdi2013.pdf
	[11:09:25] <awreece> hopefully as an alternate engine
	[11:09:26] <dho> ha
	[11:10:24] <dormando> awreece: I think I remember that paper... IIRC some part of it says "writes aren't really that important"
	[11:10:33] <dormando> so it wasn't super interesting to me. it's a clever idea though
	[11:10:33] <awreece> I don't know if the community has discussed it or not earlier, my impression from talking to the author was that the code wasn't actually prepared to contribute back
	[11:10:42] <dormando> yeah it never is
	[11:10:46] <awreece> IIRC it said "resizes don't happen"
	[11:10:48] <dormando> people write papers like that all the time
	[11:10:54] <dormando> resizes was another thing
	[11:11:09] <dormando> but it also said theoretical write speed is low
	[11:11:10] <awreece> my goal is to actually prototype it
	[11:11:16] <dormando> unless I'm confusing that with one of the other 20 papers
	[11:11:17] <dho> dormando: yes, that's the read-heavy workload paper
	[11:11:17] <awreece> really?
	[11:11:34] <dormando> yeah it's sorta buried in there
	[11:11:39] <dho> dormando: the one I was showing you a few months ago
	[11:12:17] <dormando> some of them don't work with slab rebalance too
	[11:12:32] <awreece> I've talked to the paper author and have his blessing to work on it
	[11:12:40] <dormando> cool. good luck
	[11:12:57] <awreece> when I get farther, should I also email the mailing list?
	[11:12:57] <dho> it falls over for write-heavy workloads
	[11:13:01] <dormando> yeah
	[11:13:05] <awreece> ok
	[11:13:26] <dho> I would say it would be very nice if those algorithms could be used dynamically.
	[11:13:36] <dho> or at least swapped out.
	[11:13:37] <awreece> dho: perhaps I didn't read the paper that carefully, but I never got the impression it was slower on writes existing solutions
	[11:13:49] <dho> awreece: it's definitely in there, sec
	[11:13:52] <awreece> dho: well, I was planning to make a memcached engine
	[11:14:11] <awreece> so it should theoretically be easy to swap out
	[11:14:59] <dho> "When
	[11:14:59] <dho> more write trac is generated, performance of cuckoo
	[11:14:59] <dho> hash tables declines because Inserts are serialized and
	[11:15:00] <dho> more Lookups happen concurrently"
	[11:15:09] <dho> additionally, it serializes all writes
	[11:15:18] <dormando> so does the current one :P
	[11:15:43] <awreece> it allows
	[11:15:43] <awreece> only one writer at a time—a tradeo
	we accept as our target workloads are read-heavy.
	[11:15:56] <awreece> yeah
	[11:16:07] <awreece> my impression of the current code was that it serialized writers
	[11:16:16] <dormando> it is, but the lock is really short
	[11:16:21] <dho> anyhoo. just some thoughts. i don't actually use memcached, i just sit in here to torment dormando
	[11:16:21] <dormando> so something complex + serialized will lose to it
	[11:16:22] <awreece> yeah, this could be held for a while
	[11:16:29] <dho> and watch him yell at people
	[11:16:31] <dormando> dho: shush
	[11:16:31] <awreece> also, hi dho
	[11:16:37] <dho> also hi
	[11:16:38] <awreece> I mostly see you in #go-nuts
	[11:16:49] <dho> yarp
	[11:16:53] <dho> i work with dormando
	[11:16:58] <dho> so obligatory hard time
	[11:16:59] <dormando> awreece: feel free to engineer whatevs. just be aware that it's hard for me to "swap in the default" engine despite dozens of people screaming at me to do so :P
	[11:17:29] <awreece> do we have real benchmarks to be able to say stuff like "our target workloads are read heavy' or whatnot?
	[11:17:41] <dormando> I'm not sure what you mean
	[11:17:42] <dho> you can configure mccrusher to effectively make those claims
	[11:17:57] <dormando> well.. yeah.
	[11:18:00] <dho> if that's what you're asking
	[11:18:02] <awreece> well, I don't use memcached in production, but I'm sure there exist peopel that do :P
	[11:18:07] <awreece> the peopel who would complain
	[11:18:09] <dormando> with mc-crusher you can just run some readers and some writers
	[11:18:10] <awreece> do they have numbers?
	[11:18:15] <dormando> I don't know
	[11:18:18] <awreece> traces?
	[11:18:22] <awreece> ok, thats fair
	[11:18:29] <dho> there are people who have write-heavy workloads
	[11:18:36] <dormando> some people do massive imports when memcached starts
	[11:18:39] <dormando> or daily imports and stuff
	[11:18:40] <awreece> my first question is absolutely "what do workloads look like?"
	[11:18:45] <awreece> hrm
	[11:18:48] <dormando> workloads look like everything and all things under teh sun
	[11:18:51] <dho> there are people who have 10% hit rates
	[11:19:01] <dho> but it's still a win, because it's still better than 0%
	[11:19:03] <awreece> ... why are they using memcached
	[11:19:05] <dormando> so I only ever improve the performance of all edge cases
	[11:19:06] <awreece> ha, ok
	[11:19:08] <dormando> I never reduce it
	[11:19:19] <dormando> ie; I can't increase memory usage
	[11:19:23] <dormando> and I can't slow down writes in favor of reads
	[11:19:29] <dormando> I only just chip away at making them scale better
	[11:19:38] <awreece> mreh
	[11:19:47] <dormando> that's what alternative storage engines were supposed tos olve
	[11:19:55] <dormando> but we've had issues getting people to test that branch
	[11:20:11] <awreece> it took me a while to find the engine branch
	[11:20:22] <dho> that's a good approach ;)
	[11:20:23] <dormando> yeah. the stupid engine branch is behind 1.4 for perf patches
	[11:20:30] <dormando> so bench 1.4
	[11:20:34] <dormando> then write your engine into 1.5 or something
	[11:20:52] <dormando> if you're looking to be sure that "write speed below this level is fine" you'll never find that
	[11:21:00] <awreece> could you repeat that one more time?
	[11:21:02] <dormando> there're tons of companies who use the thing who'll never post anything anywhere ever
	[11:21:14] <awreece> you want me to submit code against 1.5?
	[11:21:14] <dormando> awreece: the engine branch is behind the stable tree
	[11:21:17] <dormando> it is slower
	[11:21:19] <awreece> yeah, I saw that
	[11:21:21] <awreece> oh, really
	[11:21:30] <dormando> yes
	[11:21:32] <awreece> ... why?
	[11:21:44] <dormando> because every time I beg for help testing the engine branch nobody does it
	[11:21:50] <dormando> so I had to keep developing on -stable
	[11:21:57] <dormando> because this project is cursed and I hate it
	[11:22:01] <awreece> :(
	[11:22:23] <dormando> the engine tree is fine, I just never got enough attention to it to line it up with memcached itself
	[11:22:32] <dormando> so the engine tree appears in mysql, couchbase, etc
	[11:22:41] <dormando> with storage engines which ahve different requirements and expectations
	[11:23:05] <dormando> anyway I gotta run
	[11:23:06] <dormando> have fun
	[11:23:09] <awreece> thanks