PharkMillups · October 25, 2010 22:07
diff --git a/gistfile1.txt b/gistfile1.txt
 21:01 <allen_> just tested, and found out row data size 2000 bytes and 4000 bytes does not have differences. interesting... 

 21:02 <benblack> differences in what? 

 21:02 <benblack> (and what's a row? ;) ) 

 21:03 <allen_> different in data size. row as in db, i meant 

 21:04 <allen_> fyi, my case is disk io intensive. 

 21:07 <benblack> are you testing a relational database? 

 21:07 <allen_> benblack: i think u r mocking me, i m testing riak with bitcask 

 21:07 <benblack> i would not expect a difference between 2k and 4k. the OS read ahead 
 behavior makes it likely it is reading about the same in both cases. 

 21:08 <benblack> allen_: i am wondering why you are calling them rows. helps not to use 
 relational terminology in this. 

 21:10 <allen_> benblack: thanks for correctiion. if I change the data size 4100, it 
 will be very different, i guess. 

 21:10 <benblack> again, depends on what the OS is doing underneath. 

 21:11 <benblack> are you just reading the same document over and over? 

 21:11 <allen_> y, i already tested with 5000 bytes, it was not good 

 21:11 <benblack> generally, really, how are you testing? 

 21:11 <allen_> tsung 

 21:12 <benblack> through http? 

 21:12 <allen_> yes 

 21:12 <benblack> if your goal is performance, you know the protobufs interface is _much_ faster? 

 21:12 <allen_> y, i know. few ms is not a big deal for me. 

 21:12 <benblack> it's not a few ms 

 21:13 <allen_> then how much? 

 21:13 <benblack> tsung is a tool, but doesn't tell me how you are testing 

 21:13 <benblack> how many documents? what access pattern? 

 21:14 <benblack> is your working set larger than memory in the clsuter? 

 21:14 <benblack> what r/w/n_vals? 

 21:15 <allen_> 30M documents, 4:1 r:w, yes larger than memory. r:1, w:1, n:2 I tested 

 21:15 <benblack> your working set or your dataset is larger than memory? 

 21:16 <allen_> don't know the meaning of working set. 

 21:16 <benblack> the set of things most accessed 

 21:16 <benblack> is your access pattern completely random across all 30M documents? 

 21:17 <allen_> yes, i use uniform acess 

 21:17 <benblack> is that your actual access pattern? 

 21:17 <allen_> yes 

 21:17 <benblack> have you tested this with other databases? 

 21:17 <benblack> with the exact same hardware 

 21:17 <allen_> nope 

 21:17 <benblack> ok 

 21:17 <benblack> here's the situation 

 21:17 <benblack> it doesn't matter what db you use 

 21:18 <benblack> you are describing the worst case scenario 

 21:18 <benblack> you either need to increase the total RAM in your cluster to allow 
 your entire dataset to be in cache or you need SSDs 

 21:18 <benblack> or you just accept the latency of going to disk for the constant misses 

 21:19 <allen_> kool, thanks I will recommend that 

 21:20 <benblack> it's much more common that access patterns across datasets are heavily 
 biased to a subset of the data 

 21:20 <benblack> so you can have much less RAM than the total dataset of only rarely 
 need to hit disk 

 21:21 <allen_> k, question how much pb is faster than http access? 

 21:21 <benblack> as you obviously know, some apps just have random/uniform access 
 across their entire dataset 

 21:21 <benblack> you'd need to measure for your app, but you could see throughput 
 more than double (assuming your throughput isn't dominated by disk latency) 

 21:21 <benblack> something to test 

 21:22 <benblack> have you tried using basho_bench? 

 21:22 <allen_> yes 

 21:22 <allen_> basho_bench is serializing requests in a worker, and doing its best. 

 21:23 <benblack> have you increase the number of works? 

 21:23 <benblack> workers 

 21:24 <allen_> I did for the worst case, it did not give me a single error. 

 21:24 <benblack> asking something different: you said it is serializing requests in a 
 single worker. you increased the number of workers and all requests went through only 1? 

 21:25 <allen_> i meant serializing requests in a worker, i meant 

 21:25 <benblack> right, so you increase workers 

 21:25 <benblack> how many workers did you use? 

 21:25 <allen_> increased workers up to 100 

 21:26 <benblack> what mode? 

 21:26 <benblack> and what hardware on server vs client 

 21:26 <allen_> max mode. 

 21:27 <allen_> is hardware relevant for basho_bench? 

 21:27 <benblack> the relative performance of the client and server is 

 21:32 <allen_> to answer it, the same hardware on server and client. solaris 

 21:37 <benblack> with how many nodes in the cluster? 

 21:37 <allen_> 5 

 21:37 <benblack> how many clients? 

 21:38 <allen_> only one client, bash_bench test, I think i misunderstood. 

 21:38 <benblack> how many client machines? 


 21:40 <allen_> benblack: client machines? i m doing load testing, sending requests from 
 load tester to riak servers. 

 21:41 <benblack> allen_: i understand, my question is how many "load tester" 
 machines you are using 

 21:41 <allen_> oh.. one machine 

 21:42 <benblack> allen_: can i suggest there is a serious flaw in your methodology? 

 21:42 <allen_> sure 

 21:42 <benblack> you have a 5 node cluster 

 21:42 <benblack> and you are testing from 1 machine 

 21:43 <benblack> it is entirely possible you are running out of capacity (cpu or network bandwidth) 
 on that test machine 

 21:43 <benblack> so the performance limit you are seeing is not riak at all 

 21:43 <benblack> are you distributing the request load across all 5 cluster nodes or sending 
 all requests to a single node? 

 21:44 <allen_> it's in the same DC, and sending request to 5 node, round-robin 

 21:44 <benblack> allen_: what throughput are you using with that setup? 

 21:45 <DeadZen> a single load testing server should have like 3 network cards ;) 

 21:45 <benblack> s/using/seeing/ with that setup, allen_ 

 21:46 <allen_> benblack: 14ms/sec 

 21:46 <benblack> since, for example, riak requires entire objects be written at once 

 21:47 <benblack> allen_: sorry, what? 

 21:47 <benblack> 14ms/sec? i don't understand 

 21:47 <allen_> sorry 14ms avg 

 21:47 <benblack> allen_: avg not so useful...what is the request rate? 

 21:48 <allen_> 1700tps 

 21:50 <benblack> with what size objects? 

 21:50 <allen_> 4K 

 21:50 <benblack> and with 2k? 

 21:50 <allen_> yes 

 21:50 <benblack> what is the CPU load on the test client during this? 

 21:51 <allen_> since it is vm, it varies, min 1.8, max 5.5 cpuload 

 21:51 <benblack> not load 

 21:51 <benblack> % 

 21:51 <benblack> but what you are telling me is you are most likely 
 maxing out your client 

 21:52 <allen_> I don't have data, but it was very low. 

 21:52 <benblack> it is capable of 1700 reqs/sec with your testing. 

 21:52 <benblack> is this on your own infrastructure or on EC2 or something? 

 21:53 <allen_> it's on Jouent cloud. 

 21:53 <benblack> oy vey 

 21:53 <allen_> ? 

 21:53 <benblack> here is my recommendation: run multiple test clients at once 
 on multiple machines 

 21:54 <benblack> (oy vey-> if you are so concerned about performance, use physical machines) 

 21:54 <benblack> i don't know what exactly your 5 cluster nodes are 

 21:54 <allen_> physical machines? u mean dediacated servers? 

 21:54 <benblack> you said you had strong performance requirements 

 21:54 <benblack> so do i 

 21:55 <benblack> that's why i use dedicated servers. 

 21:55 <allen_> y I wish I could, I just followed Bash blog. 

 21:55 <benblack> again, i don't know what the cluster nodes are, but what you are describing 
 sounds a lot like a client bottleneck, not a server side issue. 

 21:56 <allen_> client bottleneck, hmm . 

 21:58 <benblack> start multiple clients and run your tests from them at the same time. 

 21:58 <benblack> assuming you aren't bottlenecking on something else, i am guessing the 
 total throughput will be higher than 1700 reqs/sec. 

 22:00 <allen_> more than 5 client machines? costly.. 

 22:00 <benblack> try 2. 

 22:00 <benblack> if things go faster, you are probably seeing a client bottleneck. 

 22:00 <allen_> kool 

 22:01 <allen_> http://blog.basho.com/category/joyent/ 

 22:01 <allen_> that's how I have servers on Joyent 

 22:01 <benblack> i'm sure it's fine. 

 22:01 <benblack> you just need to benchmark better. 

 22:02 <benblack> (and tell arg to just open a socket) 

 22:03 <allen_> thanks benblack, I will use multiple clents and see the result 

 22:03 <allen_> gotta sleep
	21:01 <allen_> just tested, and found out row data size 2000 bytes and 4000 bytes does not have differences. interesting...

	21:02 <benblack> differences in what?

	21:02 <benblack> (and what's a row? ;) )

	21:03 <allen_> different in data size. row as in db, i meant

	21:04 <allen_> fyi, my case is disk io intensive.

	21:07 <benblack> are you testing a relational database?

	21:07 <allen_> benblack: i think u r mocking me, i m testing riak with bitcask

	21:07 <benblack> i would not expect a difference between 2k and 4k. the OS read ahead
	behavior makes it likely it is reading about the same in both cases.

	21:08 <benblack> allen_: i am wondering why you are calling them rows. helps not to use
	relational terminology in this.

	21:10 <allen_> benblack: thanks for correctiion. if I change the data size 4100, it
	will be very different, i guess.

	21:10 <benblack> again, depends on what the OS is doing underneath.

	21:11 <benblack> are you just reading the same document over and over?

	21:11 <allen_> y, i already tested with 5000 bytes, it was not good

	21:11 <benblack> generally, really, how are you testing?

	21:11 <allen_> tsung

	21:12 <benblack> through http?

	21:12 <allen_> yes

	21:12 <benblack> if your goal is performance, you know the protobufs interface is _much_ faster?

	21:12 <allen_> y, i know. few ms is not a big deal for me.

	21:12 <benblack> it's not a few ms

	21:13 <allen_> then how much?

	21:13 <benblack> tsung is a tool, but doesn't tell me how you are testing

	21:13 <benblack> how many documents? what access pattern?

	21:14 <benblack> is your working set larger than memory in the clsuter?

	21:14 <benblack> what r/w/n_vals?

	21:15 <allen_> 30M documents, 4:1 r:w, yes larger than memory. r:1, w:1, n:2 I tested

	21:15 <benblack> your working set or your dataset is larger than memory?

	21:16 <allen_> don't know the meaning of working set.

	21:16 <benblack> the set of things most accessed

	21:16 <benblack> is your access pattern completely random across all 30M documents?

	21:17 <allen_> yes, i use uniform acess

	21:17 <benblack> is that your actual access pattern?

	21:17 <allen_> yes

	21:17 <benblack> have you tested this with other databases?

	21:17 <benblack> with the exact same hardware

	21:17 <allen_> nope

	21:17 <benblack> ok

	21:17 <benblack> here's the situation

	21:17 <benblack> it doesn't matter what db you use

	21:18 <benblack> you are describing the worst case scenario

	21:18 <benblack> you either need to increase the total RAM in your cluster to allow
	your entire dataset to be in cache or you need SSDs

	21:18 <benblack> or you just accept the latency of going to disk for the constant misses

	21:19 <allen_> kool, thanks I will recommend that

	21:20 <benblack> it's much more common that access patterns across datasets are heavily
	biased to a subset of the data

	21:20 <benblack> so you can have much less RAM than the total dataset of only rarely
	need to hit disk

	21:21 <allen_> k, question how much pb is faster than http access?

	21:21 <benblack> as you obviously know, some apps just have random/uniform access
	across their entire dataset

	21:21 <benblack> you'd need to measure for your app, but you could see throughput
	more than double (assuming your throughput isn't dominated by disk latency)

	21:21 <benblack> something to test

	21:22 <benblack> have you tried using basho_bench?

	21:22 <allen_> yes

	21:22 <allen_> basho_bench is serializing requests in a worker, and doing its best.

	21:23 <benblack> have you increase the number of works?

	21:23 <benblack> workers

	21:24 <allen_> I did for the worst case, it did not give me a single error.

	21:24 <benblack> asking something different: you said it is serializing requests in a
	single worker. you increased the number of workers and all requests went through only 1?

	21:25 <allen_> i meant serializing requests in a worker, i meant

	21:25 <benblack> right, so you increase workers

	21:25 <benblack> how many workers did you use?

	21:25 <allen_> increased workers up to 100

	21:26 <benblack> what mode?

	21:26 <benblack> and what hardware on server vs client

	21:26 <allen_> max mode.

	21:27 <allen_> is hardware relevant for basho_bench?

	21:27 <benblack> the relative performance of the client and server is

	21:32 <allen_> to answer it, the same hardware on server and client. solaris

	21:37 <benblack> with how many nodes in the cluster?

	21:37 <allen_> 5

	21:37 <benblack> how many clients?

	21:38 <allen_> only one client, bash_bench test, I think i misunderstood.

	21:38 <benblack> how many client machines?


	21:40 <allen_> benblack: client machines? i m doing load testing, sending requests from
	load tester to riak servers.

	21:41 <benblack> allen_: i understand, my question is how many "load tester"
	machines you are using

	21:41 <allen_> oh.. one machine

	21:42 <benblack> allen_: can i suggest there is a serious flaw in your methodology?

	21:42 <allen_> sure

	21:42 <benblack> you have a 5 node cluster

	21:42 <benblack> and you are testing from 1 machine

	21:43 <benblack> it is entirely possible you are running out of capacity (cpu or network bandwidth)
	on that test machine

	21:43 <benblack> so the performance limit you are seeing is not riak at all

	21:43 <benblack> are you distributing the request load across all 5 cluster nodes or sending
	all requests to a single node?

	21:44 <allen_> it's in the same DC, and sending request to 5 node, round-robin

	21:44 <benblack> allen_: what throughput are you using with that setup?

	21:45 <DeadZen> a single load testing server should have like 3 network cards ;)

	21:45 <benblack> s/using/seeing/ with that setup, allen_

	21:46 <allen_> benblack: 14ms/sec

	21:46 <benblack> since, for example, riak requires entire objects be written at once

	21:47 <benblack> allen_: sorry, what?

	21:47 <benblack> 14ms/sec? i don't understand

	21:47 <allen_> sorry 14ms avg

	21:47 <benblack> allen_: avg not so useful...what is the request rate?

	21:48 <allen_> 1700tps

	21:50 <benblack> with what size objects?

	21:50 <allen_> 4K

	21:50 <benblack> and with 2k?

	21:50 <allen_> yes

	21:50 <benblack> what is the CPU load on the test client during this?

	21:51 <allen_> since it is vm, it varies, min 1.8, max 5.5 cpuload

	21:51 <benblack> not load

	21:51 <benblack> %

	21:51 <benblack> but what you are telling me is you are most likely
	maxing out your client

	21:52 <allen_> I don't have data, but it was very low.

	21:52 <benblack> it is capable of 1700 reqs/sec with your testing.

	21:52 <benblack> is this on your own infrastructure or on EC2 or something?

	21:53 <allen_> it's on Jouent cloud.

	21:53 <benblack> oy vey

	21:53 <allen_> ?

	21:53 <benblack> here is my recommendation: run multiple test clients at once
	on multiple machines

	21:54 <benblack> (oy vey-> if you are so concerned about performance, use physical machines)

	21:54 <benblack> i don't know what exactly your 5 cluster nodes are

	21:54 <allen_> physical machines? u mean dediacated servers?

	21:54 <benblack> you said you had strong performance requirements

	21:54 <benblack> so do i

	21:55 <benblack> that's why i use dedicated servers.

	21:55 <allen_> y I wish I could, I just followed Bash blog.

	21:55 <benblack> again, i don't know what the cluster nodes are, but what you are describing
	sounds a lot like a client bottleneck, not a server side issue.

	21:56 <allen_> client bottleneck, hmm .

	21:58 <benblack> start multiple clients and run your tests from them at the same time.

	21:58 <benblack> assuming you aren't bottlenecking on something else, i am guessing the
	total throughput will be higher than 1700 reqs/sec.

	22:00 <allen_> more than 5 client machines? costly..

	22:00 <benblack> try 2.

	22:00 <benblack> if things go faster, you are probably seeing a client bottleneck.

	22:00 <allen_> kool

	22:01 <allen_> http://blog.basho.com/category/joyent/

	22:01 <allen_> that's how I have servers on Joyent

	22:01 <benblack> i'm sure it's fine.

	22:01 <benblack> you just need to benchmark better.

	22:02 <benblack> (and tell arg to just open a socket)

	22:03 <allen_> thanks benblack, I will use multiple clents and see the result

	22:03 <allen_> gotta sleep