Created
October 25, 2010 22:07
-
-
Save PharkMillups/645889 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
21:01 <allen_> just tested, and found out row data size 2000 bytes and 4000 bytes does not have differences. interesting... | |
21:02 <benblack> differences in what? | |
21:02 <benblack> (and what's a row? ;) ) | |
21:03 <allen_> different in data size. row as in db, i meant | |
21:04 <allen_> fyi, my case is disk io intensive. | |
21:07 <benblack> are you testing a relational database? | |
21:07 <allen_> benblack: i think u r mocking me, i m testing riak with bitcask | |
21:07 <benblack> i would not expect a difference between 2k and 4k. the OS read ahead | |
behavior makes it likely it is reading about the same in both cases. | |
21:08 <benblack> allen_: i am wondering why you are calling them rows. helps not to use | |
relational terminology in this. | |
21:10 <allen_> benblack: thanks for correctiion. if I change the data size 4100, it | |
will be very different, i guess. | |
21:10 <benblack> again, depends on what the OS is doing underneath. | |
21:11 <benblack> are you just reading the same document over and over? | |
21:11 <allen_> y, i already tested with 5000 bytes, it was not good | |
21:11 <benblack> generally, really, how are you testing? | |
21:11 <allen_> tsung | |
21:12 <benblack> through http? | |
21:12 <allen_> yes | |
21:12 <benblack> if your goal is performance, you know the protobufs interface is _much_ faster? | |
21:12 <allen_> y, i know. few ms is not a big deal for me. | |
21:12 <benblack> it's not a few ms | |
21:13 <allen_> then how much? | |
21:13 <benblack> tsung is a tool, but doesn't tell me how you are testing | |
21:13 <benblack> how many documents? what access pattern? | |
21:14 <benblack> is your working set larger than memory in the clsuter? | |
21:14 <benblack> what r/w/n_vals? | |
21:15 <allen_> 30M documents, 4:1 r:w, yes larger than memory. r:1, w:1, n:2 I tested | |
21:15 <benblack> your working set or your dataset is larger than memory? | |
21:16 <allen_> don't know the meaning of working set. | |
21:16 <benblack> the set of things most accessed | |
21:16 <benblack> is your access pattern completely random across all 30M documents? | |
21:17 <allen_> yes, i use uniform acess | |
21:17 <benblack> is that your actual access pattern? | |
21:17 <allen_> yes | |
21:17 <benblack> have you tested this with other databases? | |
21:17 <benblack> with the exact same hardware | |
21:17 <allen_> nope | |
21:17 <benblack> ok | |
21:17 <benblack> here's the situation | |
21:17 <benblack> it doesn't matter what db you use | |
21:18 <benblack> you are describing the worst case scenario | |
21:18 <benblack> you either need to increase the total RAM in your cluster to allow | |
your entire dataset to be in cache or you need SSDs | |
21:18 <benblack> or you just accept the latency of going to disk for the constant misses | |
21:19 <allen_> kool, thanks I will recommend that | |
21:20 <benblack> it's much more common that access patterns across datasets are heavily | |
biased to a subset of the data | |
21:20 <benblack> so you can have much less RAM than the total dataset of only rarely | |
need to hit disk | |
21:21 <allen_> k, question how much pb is faster than http access? | |
21:21 <benblack> as you obviously know, some apps just have random/uniform access | |
across their entire dataset | |
21:21 <benblack> you'd need to measure for your app, but you could see throughput | |
more than double (assuming your throughput isn't dominated by disk latency) | |
21:21 <benblack> something to test | |
21:22 <benblack> have you tried using basho_bench? | |
21:22 <allen_> yes | |
21:22 <allen_> basho_bench is serializing requests in a worker, and doing its best. | |
21:23 <benblack> have you increase the number of works? | |
21:23 <benblack> workers | |
21:24 <allen_> I did for the worst case, it did not give me a single error. | |
21:24 <benblack> asking something different: you said it is serializing requests in a | |
single worker. you increased the number of workers and all requests went through only 1? | |
21:25 <allen_> i meant serializing requests in a worker, i meant | |
21:25 <benblack> right, so you increase workers | |
21:25 <benblack> how many workers did you use? | |
21:25 <allen_> increased workers up to 100 | |
21:26 <benblack> what mode? | |
21:26 <benblack> and what hardware on server vs client | |
21:26 <allen_> max mode. | |
21:27 <allen_> is hardware relevant for basho_bench? | |
21:27 <benblack> the relative performance of the client and server is | |
21:32 <allen_> to answer it, the same hardware on server and client. solaris | |
21:37 <benblack> with how many nodes in the cluster? | |
21:37 <allen_> 5 | |
21:37 <benblack> how many clients? | |
21:38 <allen_> only one client, bash_bench test, I think i misunderstood. | |
21:38 <benblack> how many client machines? | |
21:40 <allen_> benblack: client machines? i m doing load testing, sending requests from | |
load tester to riak servers. | |
21:41 <benblack> allen_: i understand, my question is how many "load tester" | |
machines you are using | |
21:41 <allen_> oh.. one machine | |
21:42 <benblack> allen_: can i suggest there is a serious flaw in your methodology? | |
21:42 <allen_> sure | |
21:42 <benblack> you have a 5 node cluster | |
21:42 <benblack> and you are testing from 1 machine | |
21:43 <benblack> it is entirely possible you are running out of capacity (cpu or network bandwidth) | |
on that test machine | |
21:43 <benblack> so the performance limit you are seeing is not riak at all | |
21:43 <benblack> are you distributing the request load across all 5 cluster nodes or sending | |
all requests to a single node? | |
21:44 <allen_> it's in the same DC, and sending request to 5 node, round-robin | |
21:44 <benblack> allen_: what throughput are you using with that setup? | |
21:45 <DeadZen> a single load testing server should have like 3 network cards ;) | |
21:45 <benblack> s/using/seeing/ with that setup, allen_ | |
21:46 <allen_> benblack: 14ms/sec | |
21:46 <benblack> since, for example, riak requires entire objects be written at once | |
21:47 <benblack> allen_: sorry, what? | |
21:47 <benblack> 14ms/sec? i don't understand | |
21:47 <allen_> sorry 14ms avg | |
21:47 <benblack> allen_: avg not so useful...what is the request rate? | |
21:48 <allen_> 1700tps | |
21:50 <benblack> with what size objects? | |
21:50 <allen_> 4K | |
21:50 <benblack> and with 2k? | |
21:50 <allen_> yes | |
21:50 <benblack> what is the CPU load on the test client during this? | |
21:51 <allen_> since it is vm, it varies, min 1.8, max 5.5 cpuload | |
21:51 <benblack> not load | |
21:51 <benblack> % | |
21:51 <benblack> but what you are telling me is you are most likely | |
maxing out your client | |
21:52 <allen_> I don't have data, but it was very low. | |
21:52 <benblack> it is capable of 1700 reqs/sec with your testing. | |
21:52 <benblack> is this on your own infrastructure or on EC2 or something? | |
21:53 <allen_> it's on Jouent cloud. | |
21:53 <benblack> oy vey | |
21:53 <allen_> ? | |
21:53 <benblack> here is my recommendation: run multiple test clients at once | |
on multiple machines | |
21:54 <benblack> (oy vey-> if you are so concerned about performance, use physical machines) | |
21:54 <benblack> i don't know what exactly your 5 cluster nodes are | |
21:54 <allen_> physical machines? u mean dediacated servers? | |
21:54 <benblack> you said you had strong performance requirements | |
21:54 <benblack> so do i | |
21:55 <benblack> that's why i use dedicated servers. | |
21:55 <allen_> y I wish I could, I just followed Bash blog. | |
21:55 <benblack> again, i don't know what the cluster nodes are, but what you are describing | |
sounds a lot like a client bottleneck, not a server side issue. | |
21:56 <allen_> client bottleneck, hmm . | |
21:58 <benblack> start multiple clients and run your tests from them at the same time. | |
21:58 <benblack> assuming you aren't bottlenecking on something else, i am guessing the | |
total throughput will be higher than 1700 reqs/sec. | |
22:00 <allen_> more than 5 client machines? costly.. | |
22:00 <benblack> try 2. | |
22:00 <benblack> if things go faster, you are probably seeing a client bottleneck. | |
22:00 <allen_> kool | |
22:01 <allen_> http://blog.basho.com/category/joyent/ | |
22:01 <allen_> that's how I have servers on Joyent | |
22:01 <benblack> i'm sure it's fine. | |
22:01 <benblack> you just need to benchmark better. | |
22:02 <benblack> (and tell arg to just open a socket) | |
22:03 <allen_> thanks benblack, I will use multiple clents and see the result | |
22:03 <allen_> gotta sleep |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment