-
-
Save slfritchie/34f39ea0ab454eb8f53f to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Greg's plan for this afternoon: | |
handoff 1 | |
A+?? | |
riak_core_node_watcher:service_down(riak_kv). | |
/usr/lib64/riak/erts-5.8.4/bin/escript msgq.escript /etc/riak/vm.args | |
riak_core_node_watcher:service_up(riak_kv, whereis(riak_kv_sup)). | |
For setting 'handoff_concurrency' value: | |
I = fun(Num) -> rpc:multicall(lists:sublist([node()|nodes()], Num), application, set_env, [riak_core, handoff_concurrency, 2]), rpc:multicall(lists:sublist([node()|nodes()], Num), riak_core_vnode_manager, force_handoffs, []) end. | |
% AllFallbacks = fun(Ring) -> | |
% [{N,Fallbacks(N, Ring)} || N <- riak_core_ring:all_members(Ring), lists:member(N, [node()|nodes()])] | |
% end. | |
% | |
% rpc:multicall([element(1, N) || N <- lists:sublist(AllFallbacks(element(2, riak_core_ring_manager:get_raw_ring())), 3)], application, set_env, [riak_core, handoff_concurrency, 1]). | |
ssh -p 2222 -l basho prod-04.XXXer.com | |
[12/7/11 2:36:21 PM] Greg Burd: msgq.escript | |
[12/7/11 2:36:37 PM] Greg Burd: /usr/lib64/riak/erts-5.8.4/bin/escript msgq.escript /etc/riak/vm.args | |
[12/7/11 2:37:17 PM] Greg Burd: basho@prod-08:~$ sudo cluster "grep ERL_MAX_PORTS /etc/riak/vm.args" | |
[12/7/11 2:37:31 PM] Greg Burd: [sudo] password for basho: | |
Wed Dec 7 12:37:10 PST 2011 | |
+++ Processing: ........................................................................................................ | |
#--- prod-04: | |
-env ERL_MAX_PORTS 32768 | |
#--- prod-05: | |
-env ERL_MAX_PORTS 32768 | |
[...] | |
------------------ | |
hd(lists:reverse(lists:sort([{ process_info(Pid,message_queue_len), Pid} || Pid <- processes()]))). | |
OwnedPartitions = fun(Owners, Node) -> | |
[P || {P, Owner} <- Owners, Owner =:= Node] | |
end. | |
VnodePids = fun(Node) -> [Pid || {_,Pid,_,_} <- supervisor:which_children({riak_core_vnode_sup, Node}), is_pid(Pid)] end. | |
RunningVnodes = fun(Node) -> | |
Pids = VnodePids(Node), | |
[rpc:call(Node, riak_core_vnode, get_mod_index, [Pid], 30000) || Pid <- Pids] | |
end. | |
ActivePartitions = fun(Node) -> | |
lists:foldl(fun({_,P}, Ps) -> | |
ordsets:add_element(P, Ps) | |
end, [], RunningVnodes(Node)) | |
end. | |
Fallbacks = fun(Node, Ring) -> | |
Owners = riak_core_ring:all_owners(Ring), | |
Owned = ordsets:from_list(OwnedPartitions(Owners, Node)), | |
Active = ordsets:from_list(ActivePartitions(Node)), | |
Stopped = ordsets:subtract(Owned, Active), | |
ordsets:subtract(Active, Owned) | |
end. | |
-------------------------- | |
[12/7/11 2:42:40 PM] Scott Fritchie: +A 257 sounds like a fun number to experiment with. | |
[12/7/11 2:43:09 PM] Greg Burd: {wants_claim_fun, {riak_core_new_claim, new_wants_claim}}, | |
{choose_claim_fun, {riak_core_new_claim, new_choose_claim}}, | |
-------------------------- | |
http://dl.dropbox.com/u/11999159/Basho/2011-12-06-log1.txt | |
rp(lists:sort([{Mem, Pid, Queue, Dict} | |
|| Pid <- [Pid || {_,Pid,_,_} <- supervisor:which_children(riak_core_vnode_sup)], | |
{memory, Mem} <- [catch process_info(Pid, memory)], | |
{message_queue_len, Queue} <- [catch process_info(Pid, message_queue_len)], | |
{dictionary, Dict} <- [catch process_info(Pid, dictionary)], | |
is_integer(Mem)])). | |
lists:reverse(lists:sort([{Queue, Pid} || Pid <- [Pid || {_,Pid,_,_} <- supervisor:which_children(riak_core_vnode_sup)], {memory, Mem} <- [catch process_info(Pid, memory)], {message_queue_len, Queue} <- [catch process_info(Pid, message_queue_len)], {dictionary, Dict} <- [catch process_info(Pid, dictionary)], is_integer(Mem)])). | |
-------------------------- | |
basho@prod-04:~$ awk '/# Server: prod-[0123456789]/ { last = 2 } /^10/ && last == 0 { print } { last--}' /etc/hosts | |
10.28.60.208 prod-04-internal.XXXer.com raid-0-b | |
10.28.60.210 prod-05-internal.XXXer.com raid-0-b | |
10.28.60.212 prod-06-internal.XXXer.com RAID0-B | |
10.28.60.214 prod-07-internal.XXXer.com RAID0-B | |
10.28.60.216 prod-08-internal.XXXer.com RAID0-B | |
10.28.60.218 prod-09-internal.XXXer.com RAID0-B | |
10.28.60.220 prod-10-internal.XXXer.com Raid0-B | |
10.28.60.222 prod-11-internal.XXXer.com Raid0-B | |
10.28.60.224 prod-12-internal.XXXer.com RAID0-B | |
10.28.60.226 prod-13-internal.XXXer.com RAID5-B | |
10.28.60.228 prod-14-internal.XXXer.com RAID0-B | |
10.28.60.230 prod-15-internal.XXXer.com RAID0-B | |
10.28.60.202 prod-16-internal.XXXer.com --Not-In-Service-- | |
10.28.60.232 prod-17-internal.XXXer.com raid-5-b | |
10.28.60.234 prod-18-internal.XXXer.com raid-5-b | |
10.28.60.238 prod-19-internal.XXXer.com RAID0-B | |
10.28.60.240 prod-20-internal.XXXer.com RAID0-B | |
10.28.60.242 prod-21-internal.XXXer.com RAID0-B | |
10.28.60.244 prod-22-internal.XXXer.com raid0-b | |
10.28.60.246 prod-23-internal.XXXer.com RAID0-B | |
10.28.60.248 prod-24-internal.XXXer.com RAID0-B | |
10.28.60.250 prod-25-internal.XXXer.com raid0-b | |
10.28.60.254 prod-26-internal.XXXer.com RAID0-B | |
10.28.60.252 prod-27-internal.XXXer.com RAID0-A | |
10.28.233.195 prod-28-internal.XXXer.com raid0-b | |
10.28.233.197 prod-29-internal.XXXer.com raid0-b | |
10.28.233.199 prod-30-internal.XXXer.com RAID0-B | |
10.28.233.201 prod-31-internal.XXXer.com RAID0-B | |
10.28.233.203 prod-32-internal.XXXer.com raid0-b | |
10.28.233.205 prod-33-internal.XXXer.com RAID0-B | |
10.28.233.207 prod-34-internal.XXXer.com Raid0-B | |
10.28.233.209 prod-35-internal.XXXer.com RAID0-B | |
10.28.60.236 prod-99-internal.XXXer.com | |
------------ | |
/usr/lib/riak/erts-5.8.4/bin/erl -setcookie riak_XXXer_prod -hidden -name [email protected] -remsh [email protected] | |
Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:24:24] [rq:24] [async-threads:0] [kernel-poll:false] | |
Eshell V5.8.4 (abort with ^G) | |
([email protected])1> {percept_profile:start("/tmp/slf.percept.prod-06.nopatch"), timer:sleep(5*1000), percept_profile:stop()}. | |
Starting profiling. | |
{{ok,#Port<0.76127726>},ok,ok} | |
([email protected])2> os:cmd("bzip2 -9v /tmp/slf.percept.prod-06.nopatch"). | |
" /tmp/slf.percept.prod-06.nopatch: 11.485:1, 0.697 bits/byte, 91.29% saved, 69744612 in, 6072897 out.\n" | |
---------- | |
* Convoy hunting | |
In slf.percept.prod-17.a.patch, at times: | |
14.5 - 15.1 seconds | |
delta [ms] time [ms] pid activity module:function/arity #runnables | |
0.0260 14505.0890 <0.398.0> inactive gen_server2:process_next_msg/9 3 | |
0.1250 14505.2140 <0.11337.3385> inactive prim_inet:recv0/3 2 | |
263.2790 14768.4930 <0.82.0> active gen_server:loop/6 3 | |
0.0070 14768.5000 <0.91.0> active gen_server:loop/6 4 | |
299.7850 15068.2850 <0.398.0> active gen_server2:process_next_msg/9 5 | |
0.0570 15068.3420 <0.398.0> inactive gen_server2:process_next_msg/9 4 | |
0.3950 15068.7370 <0.11995.3385> active prim_inet:recv0/3 5 | |
0.0330 15068.7700 <0.82.0> inactive gen_server:loop/6 4 | |
0.0800 15068.8500 <0.13398.3385> active prim_inet:accept0/2 5 | |
0.0200 15068.8700 <0.13399.3385> active prim_inet:accept0/2 6 | |
0.0140 15068.8840 <0.13403.3385> active prim_inet:accept0/2 7 | |
..... | |
20.3 - 20.4 seconds | |
delta [ms] time [ms] pid activity module:function/arity #runnables | |
0.0290 20303.8780 <0.6542.3> active gen_fsm:loop/7 5 | |
0.0080 20303.8860 <0.371.0> inactive gen_server:loop/6 4 | |
41.4230 20345.3090 <0.13147.3384> inactive prim_file:drv_get_response/1 3 | |
0.0530 20345.3620 <0.19601.3385> active gen_fsm:loop/7 4 | |
0.0450 20345.4070 <0.19549.3385> active mochiweb_http:request/3 5 | |
0.0050 20345.4120 <0.19553.3385> active mochiweb_http:request/3 6 | |
20.48 - 20.9 seconds | |
delta [ms] time [ms] pid activity module:function/arity #runnables | |
0.0050 20493.0300 <0.19699.3385> inactive mochiweb_http:request/3 6 | |
0.0070 20493.0370 <0.19811.3385> inactive erlang:apply/2 5 | |
386.2050 20879.2420 <0.20033.7> inactive erlang:crc32/2 4 | |
0.0070 20879.2490 <0.19444.3385> active erlang:process_info/2 5 | |
0.0110 20879.2600 <0.398.0> active gen_server2:process_next_msg/9 6 | |
0.0060 20879.2660 <0.82.0> active gen_server:loop/6 7 | |
0.0050 20879.2710 <0.20033.7> active erlang:crc32/2 8 | |
0.0030 20879.2740 <0.23401.3375> active prim_inet:close_pend_loop/2 9 | |
0.0060 20879.2800 <0.8920.4> active gen_fsm:loop/7 10 | |
25.90 - 26.7 seconds | |
delta [ms] time [ms] pid activity module:function/arity #runnables | |
0.0240 25991.9890 <0.29987.9> inactive gen_fsm:loop/7 3 | |
0.0080 25991.9970 <0.398.0> inactive gen_server2:process_next_msg/9 2 | |
493.1710 26485.1680 <0.91.0> active gen_server:loop/6 3 | |
0.0230 26485.1910 <0.28651.3375> active mochiweb_http:request/3 4 | |
0.0290 26485.2200 <0.92.0> active gen_event:fetch_msg/5 5 | |
0.0080 26485.2280 <0.91.0> inactive gen_server:loop/6 4 | |
In slf.percept.prod-17.b.patch, holy cow, there's bad behavior between | |
1 and about 8 seconds!! | |
Each "active" message included below is the first event after the big pause. | |
Also, pid <0.398.0> is riak_kv_stat. | |
1.29 - 2.1 seconds (775 msec) | |
<0.398.0> active gen_server2:process_next_msg/9 | |
2.29 - 2.97 sec (667 msec) | |
<0.398.0> active gen_server2:process_next_msg/9 | |
3.00 - 3.87 sec (859 msec) | |
<0.398.0> active gen_server2:process_next_msg/9 | |
4.63 - 4.77 sec (131 msec) | |
<0.398.0> active gen_server2:process_next_msg/9 | |
4.79 - 5.67 sec (861 msec) | |
<0.6542.3> inactive gen_server2:cast_msg/1 | |
WEEEEEIIIRRRRDDD????? | |
5.69 - 6.6 sec (861 msec) | |
<0.398.0> active gen_server2:process_next_msg/9 | |
7.14 - 7.46 sec (310 msec) | |
<0.398.0> active gen_server2:process_next_msg/9 | |
7.67 - 8.36 sec (673 msec) | |
<0.9404.3489> active prim_file:drv_get_response/1 | |
In slf.percept.prod-17.c.patch, holy cow, even worse: a single pause | |
of 3+ seconds!! | |
NOTE: during this time there was a bitcask merge. !@#$! I didn't save | |
the Riak log message. But it happened near the end of the 31 sec | |
sample window. | |
delta [ms] time [ms] pid activity module:function/arity #runnables | |
0.0080 25637.3680 <0.398.0> active gen_server2:process_next_msg/9 4 | |
0.0050 25637.3730 <0.8920.4> inactive gen_fsm:loop/7 3 | |
0.0100 25637.3830 <0.2472.11> inactive gen_fsm:loop/7 2 | |
0.0070 25637.3900 <0.398.0> inactive gen_server2:process_next_msg/9 1 | |
3187.6570 28825.0470 <0.24.0> active gen_server:loop/6 2 | |
0.0130 28825.0600 <0.30277.3558> inactive gen:do_call/4 1 | |
0.0210 28825.0810 <0.608.0> active gen_server:loop/6 2 | |
0.0070 28825.0880 <0.82.0> active gen_server:loop/6 3 | |
0.0090 28825.0970 <0.666.0> active mi_scheduler:worker_loop/1 4 | |
0.0110 28825.1080 <0.6542.3> active gen_fsm:loop/7 5 | |
0.0080 28825.1160 <0.20033.7> active gen_fsm:loop/7 6 | |
0.0040 28825.1200 <0.665.0> active gen_server:loop/6 7 | |
0.0100 28825.1300 <0.666.0> inactive mi_scheduler:worker_loop/1 6 | |
0.0080 28825.1380 <0.23178.8> active gen_fsm:loop/7 7 | |
0.0040 28825.1420 <0.20.0> active gen_server:loop/6 8 | |
0.0100 28825.1520 <0.608.0> inactive gen:do_call/4 7 | |
0.0230 28825.1750 <0.608.0> active gen:do_call/4 8 | |
0.0040 28825.1790 <0.20.0> inactive gen_server:loop/6 7 | |
0.0070 28825.1860 <0.91.0> active gen_server:loop/6 8 | |
0.0060 28825.1920 <0.665.0> inactive gen_server:loop/6 7 | |
0.0140 28825.2060 <0.23225.3559> active prim_inet:accept0/2 8 | |
0.0060 28825.2120 <0.23228.3559> active prim_inet:accept0/2 9 | |
0.0040 28825.2160 <0.23231.3559> active prim_inet:accept0/2 10 | |
0.0050 28825.2210 <0.23233.3559> active prim_inet:accept0/2 11 | |
0.0040 28825.2250 <0.82.0> inactive gen_server:loop/6 10 | |
0.0020 28825.2270 <0.452.0> active gen_server:loop/6 11 | |
0.0030 28825.2300 <0.24.0> inactive prim_file:drv_get_response/1 10 | |
0.0040 28825.2340 <0.23235.3559> active prim_inet:accept0/2 11 | |
0.0060 28825.2400 <0.23239.3559> active prim_inet:accept0/2 12 | |
0.0060 28825.2460 <0.23244.3559> active prim_inet:accept0/2 13 | |
0.0190 28825.2650 <0.23247.3559> active prim_inet:accept0/2 14 | |
0.0080 28825.2730 <0.23249.3559> active prim_inet:accept0/2 15 | |
0.0070 28825.2800 <0.23253.3559> active prim_inet:accept0/2 16 | |
0.0110 28825.2910 <0.6542.3> inactive gen_fsm:loop/7 15 | |
0.0140 28825.3050 <0.23254.3559> active prim_inet:accept0/2 16 | |
0.0110 28825.3160 <0.23260.3559> active prim_inet:accept0/2 17 | |
0.0090 28825.3250 <0.23259.3559> active prim_inet:accept0/2 18 | |
0.0040 28825.3290 <0.23262.3559> active prim_inet:accept0/2 19 | |
0.0080 28825.3370 <0.371.0> active gen_server:loop/6 20 | |
--------------------------------------- | |
basho@prod-17:~$ cat /home/basho/DO.sh | |
#!/bin/sh | |
TIME=$1 | |
export PATH=/usr/lib/riak/erts-5.8.4/bin:$PATH | |
sh -c "while [ 1 ]; do echo RESTARTING ; date; /tmp/etop -setcookie riak_XXXer_prod -node [email protected] -interval 1 -sort msg_q 2>&1 | egrep -v --line-buffered 'application_master:m| 0 gen_fsm:loop/7' | egrep --line-buffered 'Reds|proc_lib|riak_kv' ; done" & | |
ETOPPID=$! | |
sh -c "while [ 1 ]; do date ; sleep 1 ; done" & | |
DATEPID=$! | |
sh -c "tail -f /var/log/riak/console.log | egrep --line-buffered '@mi_|/ssd|monitor'" & | |
TAILPID=$! | |
sh -c ' while [ 1 ]; do ps 29259 | grep pts | grep -v Ssl; sleep 0.05; done' & | |
PSPID=$! | |
iostat -kx 1 $TIME | |
kill $ETOPPID | |
kill $DATEPID | |
kill $TAILPID | |
kill $PSPID | |
sleep 1 | |
kill `ps axww | grep etop | grep -v grep | awk '{print $1}'` | |
--------------------------------------- | |
basho@prod-17:~$ for i in /proc/sys/vm/*dirty*; do echo $i ; cat $i ; echo "" ; done | |
/proc/sys/vm/dirty_background_bytes | |
0 | |
/proc/sys/vm/dirty_background_ratio | |
10 | |
/proc/sys/vm/dirty_bytes | |
0 | |
/proc/sys/vm/dirty_expire_centisecs | |
3000 | |
/proc/sys/vm/dirty_ratio | |
20 | |
/proc/sys/vm/dirty_writeback_centisecs | |
500 | |
--------------------------------------- | |
Dammit, I *knew* that I was seeing prod-17's disks act much, *much* | |
slower than prod-15's. ... There are 3 boxes that have RAID 5 | |
settings, see the 'XXXer-raid.txt' file. | |
prod-15: | |
Logical device number 1 | |
Logical device name : RAID0-B | |
RAID level : 0 | |
Status of logical device : Optimal | |
Size : 762382 MB | |
Stripe-unit size : 256 KB | |
Read-cache mode : Enabled | |
MaxIQ preferred cache setting : Enabled | |
MaxIQ cache setting : Disabled | |
Write-cache mode : Enabled (write-back) | |
Write-cache setting : Enabled (write-back) when protected by battery/ZMM | |
Partitioned : Yes | |
Protected by Hot-Spare : No | |
Bootable : No | |
Failed stripes : No | |
Power settings : Disabled | |
prod-17: | |
Logical device number 1 | |
Logical device name : raid-5-b | |
RAID level : 5 | |
Status of logical device : Optimal | |
Size : 667090 MB | |
Stripe-unit size : 256 KB | |
Read-cache mode : Enabled | |
MaxIQ preferred cache setting : Disabled | |
MaxIQ cache setting : Disabled | |
Write-cache mode : Enabled (write-back) | |
Write-cache setting : Enabled (write-back) when protected by battery/ZMM | |
Partitioned : Yes | |
Protected by Hot-Spare : No | |
Bootable : No | |
Failed stripes : No | |
Power settings : Disabled | |
----- | |
tail -f /var/log/riak/console.log | egrep --line-buffered -v 'exit,normal|riak_search_backend:collect_info_response:55|riak_solr_searcher_wm:malformed_request:82|webmachine error: path|too_many_results,|to_mochijson2| monitor |Problem invoking pre-commit hook|riak_indexed_doc,index,|@mi_|riak_search_kv_hook,precommit|badmatch,.error,closed|error,function_clause.*base64' & | |
----- | |
slf.percept.prod-29.bigbacklog2.patch.bz2 | |
16ms pause: 5.2053 - 5.2277 sec | |
16ms pause: 1.5708 - 1.5932 sec | |
9ms pause : 11.7316 - 11.7463 sec | |
: 16.2913 - 16.2957 sec | |
23ms pause: 24.0534 - 24.0796 sec | |
----- | |
Answer (?) to weird Erlang proc scheduling problem: busy_dist_port. | |
riak_sysmon was buggy and mis-configured! ARRGH! | |
Solution: add +zdbbl 16384 to all nodes, then do rolling restart. | |
During the rolling restart, run 32 different basho_bench instances, | |
each aimed at a single Riak cluster member. Used two shell 1-liner | |
hacks to monitor the basho_bench data: | |
* Look for non-zero error rates for GET | |
* Look for bad max latency values for GET | |
while [ 1 ]; do date ; for i in tests.prod-*/current/get*csv; do echo `echo $i | sed 's/-internal.*//'` `tail -1 $i`; done | awk '$12 > 0 { print }' | sort -nr -k 11 | head -5 ; sleep 15; done | |
while [ 1 ]; do date ; for i in tests.prod-*/current/get*csv; do echo `echo $i | sed 's/-internal.*//'` `tail -1 $i`; done | sort -nr -k 10 | head -5 ; sleep 15; done | |
----- | |
* Return of the vnode mailbox backups from hell. | |
Example: | |
Mon Dec 19 21:18:29 PST 2011 | |
'[email protected]': [{riak_kv_vnode,327571},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,17893},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,6},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,2},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,2},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,1},{riak_pipe_vnode,0},{riak_search_vnode,0}] | |
'[email protected]': [{riak_kv_vnode,0},{riak_search_vnode,0}] | |
So, we appear to be having packet drop problems. If we measure using | |
"ping", we get one set of data: no packets dropped on the external | |
interfaces. | |
root@prod-04:~# sh -c 'FILE=/tmp/slf.capt`date +%s` ; echo Putting tcpdump output in $FILE ; tcpdump -i eth0 -c 200000 -w $FILE ; tcptrace -l $FILE | grep "rexmt data pkts" | head -5 ; echo Pseudo-count of TCP flows with retransmitted packets: ; tcptrace -l $FILE | grep "rexmt data pkts" | grep -v "rexmt data pkts: 0 rexmt data pkts: 0" | wc -l' | |
Putting tcpdump output in /tmp/slf.capt1324358762 | |
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes | |
200000 packets captured | |
200012 packets received by filter | |
0 packets dropped by kernel | |
rexmt data pkts: 0 rexmt data pkts: 0 | |
rexmt data pkts: 0 rexmt data pkts: 0 | |
rexmt data pkts: 0 rexmt data pkts: 0 | |
rexmt data pkts: 0 rexmt data pkts: 14 | |
rexmt data pkts: 0 rexmt data pkts: 16 | |
Pseudo-count of TCP flows with retransmitted packets: | |
82 | |
... and ... | |
basho@prod-04:~$ bash -c 'for t in "" -internal; do for i in prod-{0{4,5,6,7,8,9},1{0,1,2,3,4,5,7,8,9},2{0,1,2,3,4,5,6,7,8,9},3{0,1,2,3,4,5,6,7,8,9},40,99}$t.XXXer.com; do sudo ping -i 0.02 -c 50 $i 2>&1; done done' | awk '/PING/ {host = $2} /transmitted/ { print host, $0 }' | grep -v '50 received' | |
prod-05-internal.XXXer.com 50 packets transmitted, 0 received, 100% packet loss, time 1369ms | |
prod-06-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 995ms | |
prod-07-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1005ms | |
prod-08-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1157ms | |
prod-10-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1114ms | |
prod-11-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1030ms | |
prod-12-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1009ms | |
prod-13-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1006ms | |
prod-14-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 993ms | |
prod-17-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1018ms | |
prod-20-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1001ms | |
prod-21-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1004ms | |
prod-24-internal.XXXer.com 50 packets transmitted, 48 received, 4% packet loss, time 1019ms | |
prod-26-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1021ms | |
prod-28-internal.XXXer.com 50 packets transmitted, 49 received, 2% packet loss, time 1025ms | |
prod-30-internal.XXXer.com 50 packets transmitted, 47 received, 6% packet loss, time 1082ms | |
prod-32-internal.XXXer.com 50 packets transmitted, 47 received, 6% packet loss, time 1012ms | |
---- | |
WRONG WRONG WRONG here for doc purposes only! | |
root@prod-04:~# for i in `seq 1 5`; do sh -c 'ifconfig eth0 ; sleep 3 ; ifconfig eth0' | grep bytes | sed -e 's/.. bytes://g' | awk 'NR == 1 { rx = $1; tx = $4} NR == 2 { printf "rx %.1f Mbit/sec tx %1.f Mbit/sec WRONG WRONG WRONG\n", ($1 - rx) / (1024*1024) / 3, ($4 - tx) / (1024*1024) / 3 }' ; donerx 71.5 Mbit/sec tx 104 Mbit/sec | |
rx 75.5 Mbit/sec tx 113 Mbit/sec | |
rx 64.5 Mbit/sec tx 44 Mbit/sec | |
rx 56.2 Mbit/sec tx 94 Mbit/sec | |
rx 79.4 Mbit/sec tx 108 Mbit/sec | |
2011-12-20 update: WWWHHHAAAAAA! That f'ing awk script is *wrong*. | |
It's not converting bytes to bits!!!!! | |
Here's a correct one: | |
root@prod-04:~# date ; for i in `seq 1 5`; do sh -c 'ifconfig eth0 ; sleep 3 ; ifconfig eth0' | grep bytes | sed -e 's/.. bytes://g' | awk 'NR == 1 { rx = $1; tx = $4} NR == 2 { printf "rx %.1f Mbit/sec tx %1.f Mbit/sec\n", ($1 - rx) * 8 / (1024*1024) / 3, ($4 - tx) * 8 / (1024*1024) / 3 }' ; done | |
Tue Dec 20 15:18:49 PST 2011 | |
rx 423.0 Mbit/sec tx 488 Mbit/sec | |
rx 491.6 Mbit/sec tx 566 Mbit/sec | |
rx 483.2 Mbit/sec tx 519 Mbit/sec | |
rx 437.1 Mbit/sec tx 495 Mbit/sec | |
rx 456.5 Mbit/sec tx 489 Mbit/sec | |
--- | |
root@prod-04:~# grep -i qu /tmp/slf.foo | |
105815089 requests sent out | |
12900 reassemblies required | |
source quenches: 106 | |
echo requests: 58153 | |
echo request: 95724 | |
416815 packets pruned from receive queue because of socket buffer overrun | |
2222 packets pruned from receive queue | |
1 packets dropped from out-of-order queue because of socket buffer overrun | |
Quick ack mode was activated 9682843 times | |
413634 times the listen queue of a socket overflowed | |
600138 packets directly queued to recvmsg prequeue. | |
1780022 bytes directly received in process context from prequeue | |
167 packets header predicted and directly queued to user | |
4475848 packets collapsed in receive queue due to low socket buffer | |
watch -d "sh -c 'netstat -s | grep -i queue'" | |
watch -d "sh -c 'netstat -s | egrep -i retr'" | |
The "netstat -s" method confirms that there are several hundred | |
packets being retransmitted by TCP each second, on average. | |
-- | |
Wed 12-21 early morning: | |
Set MTU on all eth0 interfaces from 1500 -> 9000. | |
Then disconnect and reconnect all Erlang nodes. | |
for i in `awk '{print $1}' /tmp/slf.some-hosts `; do /usr/lib/riak/erts-5.8.4/bin/erl -name [email protected] -hidden -setcookie riak_XXXer_prod -remsh riaksearch@$i ; done | |
length(nodes()). WARNING: Search for "WOW, that caused some unintentional chaos monkey" string below!!! | |
lists:usort([{net_kernel:disconnect(Nd), timer:sleep(20), net_kernel:connect(Nd), net_adm:ping(Nd)} || Nd <- nodes()]). | |
length(nodes()). | |
--- | |
Wed 12-21: | |
Change CPU freq governor from 'ondemand' -> 'performance' | |
See set*expect script for how-to | |
Experimentation on prod-23 and prod-21 | |
13 ethtool --show-pause eth0 | |
14 ethtool --pause autoneg off rx off tx off | |
15 ethtool --pause eth0 autoneg off rx off tx off | |
16 sudo ethtool --pause eth0 autoneg off rx off tx off | |
17 ethtool --show-pause eth0 | |
19 ethtool --show-ring eth0 | |
20 xsudo ethtool --set-ring eth0 rx 4096 tx 4096 rx-mini 256 rx-jumbo 256 | |
21 sudo ethtool --set-ring eth0 rx 4096 tx 4096 rx-mini 256 rx-jumbo 256 | |
22 sudo ethtool --set-ring eth0 rx 4096 tx 4096 | |
24 ethtool --show-ring eth0 | |
Nothing really seems to help. | |
Except that Matt Ranney & his crew changed the method that the | |
timeline history is managed, shrinking the single key's max size for | |
appending history info (originally 60KB?) by half? Made a roughly | |
100Mbit/sec average decrease on all machines. | |
--- | |
Thu 12-22 | |
Matt changed the timeline history max size smaller again. Nice win. | |
Also, after: | |
"Matt R. Man, it feels really good to be finally making some | |
progress. All it took was spending an entire day yelling at | |
SoftLayer." | |
... we learned that SL's explanation is that despite average | |
utilization @ 300Mbit/sec, there are microburst periods where other | |
cluster members coincidentally burst traffic to a single receiver | |
above line rate and beyond the switch's very limited capacity to | |
buffer. So, during those very small time periods, yes, packets are | |
dropped. And yes, the 1-sec average utilization rate seems quite low | |
and reasonable. | |
That theory seems to be making some sense. See the escript file in | |
the data dir for polling the /sys/path/to/eth0/rx_bytes counter. | |
Either there's timing error (because the kernel's timer resolution is | |
100Hz and therefore Erlang's smallest timer firing is 100Hz aka 10 | |
milliseconds), or it's true that these micro-bursts are happening on | |
final egress out of the switching fabric. Bummer. | |
--- | |
ets:match() elimination | |
----------------------- | |
Data files in /Users/fritchie/b/notes/2011-12-07.XXXer.data/ets-match-stuff/* | |
prod-2000 has pthreads locking (not default custom locks) and no | |
-fomit-frame-pointer (for decent stack traces). | |
[root@prod-2000 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.before-slf | |
usr, sys, lck, slp | |
7692.6 825.5 57352 5297.3 | |
[root@prod-2000 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.after-slf-draft3 | |
usr, sys, lck, slp | |
5073.4 553.2 58133.1 4885 | |
Makes a pretty big difference for USR time. | |
prod-2009 is stock riak 1.0.2 + eleveldb | |
[root@prod-2009 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.before-slf | |
usr, sys, lck, slp | |
1346.8 650.6 59578 5631.3 | |
[root@prod-2009 ~]# awk '/PID/ { c = 0} /beam.smp/ && c < 5 { usr += $3; sys += $4; lck += $8; slp += $9} END { print "usr, sys, lck, slp" ; print usr, sys, lck, slp; }' /root/prstat.after-slf-draft3 | |
usr, sys, lck, slp | |
1269.4 584.8 56486 5512.3 | |
-------- | |
Crazy experimentation with plockstat. | |
I've managed to use a hacked version of plockstat for up to 250 | |
milliseconds at a time. (Locks are really, really hot and generate | |
many thousands of DTrace events in that 1/4 second.) The differences | |
between 1/4 second samples are striking: the highest contended locks | |
fall into several different areas. More sampling (more than 6 or 7) is | |
required, and some automated script hacks to try to figure out what | |
the truly hot areas are. | |
[root@prod-2009 ~]# for i in a b c d e f; do dtrace -x dynvarsize=4m -s /root/plockstat.d -p 28476 > /tmp/foo250.$i ; sleep 2 ; done | |
--- | |
my.joyentcloud.com | |
[email protected] | |
XXXerJoyentPasswordHere | |
From roughly Dec 23 00:00 through Dec 27 17:00 ... some polling | |
intervals are every 30 seconds, sometimes every 5 seconds, sometimes | |
polling not running at all. | |
awk '/ PST 2011$/ {c = 0 } /riak_kv_vnode/ { c++ } /riak_kv_vnode/ && c == 1 {print}' ~/XXXer.txt | sed 's/[{},]/ /g' | awk '{ print $4}' | sed -e 's/.$//' -e 's/.$//' -e 's/.$//' | sort -nr | uniq -c | |
instances vnode mailbox size x 1000 | |
--------- ------------------------- | |
1 258-259 | |
1 70-70 | |
1 54-55 | |
1 28-29 | |
5 5-6 | |
17 4-5 | |
80 3-4 | |
424 2-3 | |
1233 1-2 | |
26882 0-1 | |
--- | |
So, using the rto_min setting has some kind of units conversion | |
problem. If you actually specify "200ms", you end up getting 120 | |
*seconds* of timeout. "1ms" seems to do about the same thing. | |
Using "1ns" or "10ns" or perhaps "100ns" seems fine: there's never an | |
"iperf" run using less bandwidth than 250 Mbit/sec. | |
root@prod-04:~# for size in 1ns 10ns 1ns 100ns 1ns 1us 1ns 10us 1ns; do echo "HEY, changing to $size"; ip route change to 10.28.60.192/26 dev eth0 rto_min $size; ip route show to 10.28.60.192/26 ; for i in `seq 1 50`; do iperf -c prod-18-internal.XXXer.com -n 15M 2>&1 | grep sec ; sleep 0.1; done; done | |
HEY, changing to 1ns | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 514 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 592 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 665 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 594 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 546 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 657 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 493 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 393 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 585 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 616 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 559 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 584 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 566 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 523 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 434 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 443 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 658 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 703 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 526 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 583 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 515 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 625 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 676 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 511 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 631 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 586 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 564 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 550 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 658 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 575 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 487 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 655 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 563 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 637 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 589 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 572 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 603 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 660 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 526 Mbits/sec | |
HEY, changing to 10ns | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 546 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 557 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 598 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 584 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 642 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 582 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 657 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 634 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 489 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 556 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 538 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 520 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 605 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 523 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 561 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 402 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 555 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 619 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 610 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 587 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 638 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 610 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 605 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 588 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 534 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 617 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 537 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 601 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 672 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 658 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 635 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 629 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 631 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 604 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 656 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 735 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 423 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 443 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 652 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 544 Mbits/sec | |
HEY, changing to 1ns | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 629 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 457 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 508 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 760 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 591 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 555 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 622 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 453 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 398 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 647 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 776 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 621 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 622 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 546 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 614 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 500 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 525 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 569 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 628 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 461 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 601 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 749 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 551 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 442 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 474 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 564 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 650 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 529 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 529 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 642 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 547 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 512 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 616 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 730 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 678 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 495 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 524 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 531 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 565 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 611 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 625 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 630 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 560 Mbits/sec | |
HEY, changing to 100ns | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 470 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 587 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 640 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 670 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 521 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 518 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 632 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 637 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 673 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 503 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 502 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 575 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 619 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 638 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 592 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 610 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 509 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 521 Mbits/sec | |
[ 3] 0.0- 0.4 sec 15.0 MBytes 356 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 659 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 647 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 732 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 648 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 587 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 566 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 507 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 498 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 486 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 697 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 767 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 476 Mbits/sec | |
[ 3] 0.0- 0.1 sec 15.0 MBytes 863 Mbits/sec | |
[ 3] 0.0- 0.1 sec 15.0 MBytes 849 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 752 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 783 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 755 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 749 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 793 Mbits/sec | |
[ 3] 0.0- 0.1 sec 15.0 MBytes 872 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 771 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 743 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 788 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 725 Mbits/sec | |
HEY, changing to 1ns | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 801 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 775 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 795 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 750 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 779 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 809 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 532 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 380 Mbits/sec | |
[ 3] 0.0- 0.5 sec 15.0 MBytes 273 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 366 Mbits/sec | |
[ 3] 0.0- 0.4 sec 15.0 MBytes 349 Mbits/sec | |
[ 3] 0.0- 0.4 sec 15.0 MBytes 288 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 501 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 648 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 719 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 505 Mbits/sec | |
[ 3] 0.0- 0.4 sec 15.0 MBytes 298 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 397 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 528 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 598 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 699 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 726 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 585 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 614 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 661 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 558 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 588 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 584 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 586 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 400 Mbits/sec | |
[ 3] 0.0- 0.4 sec 15.0 MBytes 349 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 512 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 521 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 652 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 668 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 615 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 704 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 590 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 481 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 544 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 502 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 586 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 656 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 462 Mbits/sec | |
[ 3] 0.0- 0.4 sec 15.0 MBytes 341 Mbits/sec | |
HEY, changing to 1us | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 370 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 536 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 633 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 516 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 531 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 458 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 542 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 711 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 575 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 637 Mbits/sec | |
[ 3] 0.0- 1.2 sec 15.0 MBytes 103 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 713 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 629 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 705 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 650 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 572 Mbits/sec | |
[ 3] 0.0- 1.3 sec 15.0 MBytes 95.5 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 677 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 662 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 611 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 596 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 625 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 568 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 641 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 607 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 600 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 671 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 524 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 480 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 560 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 603 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 536 Mbits/sec | |
[ 3] 0.0- 1.4 sec 15.0 MBytes 92.2 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 523 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 525 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 522 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 487 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 620 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 419 Mbits/sec | |
HEY, changing to 1ns | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 573 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 618 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 543 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 640 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 552 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 375 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 488 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 458 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 486 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 583 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 645 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 624 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 566 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 542 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 581 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 685 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 647 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 722 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 452 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 553 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 507 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 497 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 666 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 730 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 441 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 555 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 755 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 609 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 525 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 651 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 719 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 636 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 518 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 641 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 671 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 595 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 622 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 626 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 377 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 366 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 378 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 434 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 697 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 636 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 694 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 702 Mbits/sec | |
HEY, changing to 10us | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.01ms | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 685 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 614 Mbits/sec | |
[ 3] 0.0-10.3 sec 15.0 MBytes 12.2 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 666 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 472 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 528 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 654 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 457 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 644 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 620 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 494 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 569 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 672 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 613 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 673 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 528 Mbits/sec | |
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 589 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 585 Mbits/sec | |
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 677 Mbits/sec | |
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 694 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 505 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 502 Mbits/sec | |
[ 3] 0.0-10.2 sec 15.0 MBytes 12.3 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 704 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 444 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 444 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 526 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 583 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 727 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 793 Mbits/sec | |
[ 3] 0.0- 0.1 sec 15.0 MBytes 971 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 639 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 483 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 516 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 620 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 560 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 574 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 570 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 552 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 635 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 576 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 655 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 564 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 730 Mbits/sec | |
HEY, changing to 1ns | |
10.28.60.192/26 dev eth0 scope link rto_min lock 0.00ms | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 834 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 759 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 579 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 565 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 458 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 388 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 497 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 454 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 509 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 415 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 644 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 512 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 608 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 627 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 594 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 518 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 686 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 485 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 539 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 646 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 680 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 638 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 547 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 634 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 601 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 504 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 413 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 522 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 796 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 796 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 385 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 412 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 388 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 594 Mbits/sec | |
[ 3] 0.0- 0.3 sec 15.0 MBytes 423 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 688 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 572 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 519 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 652 Mbits/sec | |
[ 3] 0.0- 0.4 sec 15.0 MBytes 317 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 606 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 538 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 550 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 529 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 545 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 664 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 687 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 737 Mbits/sec | |
[ 3] 0.0- 0.2 sec 15.0 MBytes 582 Mbits/sec | |
---- | |
Changed the RTO_min value for all SL cluster members on | |
Wed Dec 28 18:47:43 PST 2011. | |
I screwed up prod-09, which now has a 2x1GigE bonded interface. | |
Adding explicit routes on that machine to one of the member interfaces | |
was not a good idea. 99th% latencies shot pretty high for a little | |
while there, drat. Otherwise, things went smoothly. prod-09 was the | |
only machine that I screwed up, so no Riak operations failed. | |
Starting Wed Dec 28 21:06:12 PST 2011 - Wed Dec 28 21:14:44 PST 2011, | |
I used the net_kernel:disconnect() & reconnect trick (listed above) to | |
reset TCP connections between all Erlang nodes. I was still seeing | |
temporary (1 or 2 minute) see-saws in the 4-hour Cacti stats windows, | |
which suggest that perhaps that the old RTO_min value was being used?? | |
We'll see if more of those dips appear now or not -- it may be too | |
late in the day to see the necessary traffic pattern to trigger them, | |
so wait until tomorrow? | |
WOW, that caused some unintentional chaos monkey. Apparently my | |
innocent Riak internal networking disconnect & recoonect caused very | |
visible problems. Among other things, all node.js processes in | |
Seattle restarted: a bug in XXXer's Riak client was found in an error | |
handler that caused another exception (and then cascading?). | |
So, if we ever do that again, we need to be much more careful. | |
---- | |
1st week January 2012: | |
Libumem tuning, leak analysis, fragmentation analysis | |
See Campfire transcripts and Greg Burd for more details. | |
---- | |
Tue Jan 10 2012: | |
leveldb on prod-2024: weird slow behavior for one vnode. | |
[lots of bogus D script and output omittied here.] | |
AH! The D script was incorrect! I mixed self and this, doh! | |
On a normal prod-2010: | |
[root@prod-2010 ~]# for i in 1 2 3 4 ; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_write:entry {self->start = timestamp;} pid$target::eleveldb_write:return /self->start > 0/ { this->nsec = timestamp - self->start ; @ = quantize((this->nsec)/1000000) ; self->start = 0; total = total + this->nsec; this->nsec = 0} tick-5sec { printa(@); printf("Total milliseconds in func: %d out of %d\n", total/1000000, (timestamp - mystart) / 1000000); exit(0);}' -p 2159 ; done | |
fixit for write | |
January 11, 2012 09:26:48 PM UTC | |
dtrace: description 'BEGIN ' matched 4 probes | |
CPU ID FUNCTION:NAME | |
15 70345 :tick-5sec | |
value ------------- Distribution ------------- count | |
-1 | 0 | |
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 677 | |
1 | 2 | |
2 | 0 | |
Total milliseconds in func: 37 out of 4877 | |
[...] | |
[root@prod-2010 ~]# echo fixit ; for i in 1 2 3 4 ; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_get:entry {self->start = timestamp;} pid$target::eleveldb_get:return /self->start > 0/ { this->nsec = timestamp - self->start ; @ = quantize((this->nsec)/1000000) ; self->start = 0; total = total + this->nsec; this->nsec = 0} tick-5sec { printa(@); printf("Total milliseconds in func: %d out of %d\n", total/1000000, (timestamp - mystart) / 1000000); exit(0);}' -p 2159 ; done | |
January 11, 2012 09:28:22 PM UTC | |
dtrace: description 'BEGIN ' matched 4 probes | |
CPU ID FUNCTION:NAME | |
5 70345 :tick-5sec | |
value ------------- Distribution ------------- count | |
-1 | 0 | |
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2185 | |
1 |@ 78 | |
2 | 12 | |
4 | 11 | |
8 | 28 | |
16 | 15 | |
32 | 4 | |
64 | 0 | |
128 | 0 | |
256 | 2 | |
512 | 0 | |
Total milliseconds in func: 2204 out of 4876 | |
[...] | |
---- | |
OK, now, let's count function calls during an eleveldb_get func call: | |
Again, on prod-2010, our "normal" node. | |
[root@prod-2010 ~]# for i in 1 2 3 4; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall:::entry /pid == $target && self->active == 1/ { @[probefunc] = count(); } syscall::write*:entry /pid == $target && self->active == 1/ { @w[ustack()] = count(); } tick-5sec { printa(@); printf("Total calls to func: %d\n", total); printa(@w); exit(0);}' -p 2159 ; done | |
January 11, 2012 09:31:52 PM UTC | |
dtrace: description 'BEGIN ' matched 239 probes | |
CPU ID FUNCTION:NAME | |
14 70345 :tick-5sec | |
close 551 | |
open 552 | |
lwp_park 1224 | |
pread 2222 | |
Total calls to func: 2973 | |
[...] | |
OK, here's what I do for prod-2024: | |
* Put riak_kv service up | |
* Wait for msgq on bad vnode to get to 1,600-2,000 messages. | |
It takes about 60 seconds (!) for 1,600 messages to drain. | |
* Put riak_kv service down | |
* Wait a sec or two | |
* Start this dtrace script: | |
[root@prod-2024 ~]# for i in 1 2 3 4; do date ; dtrace -n 'BEGIN { total = 0; mystart = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall:::entry /pid == $target && self->active == 1/ { @[probefunc] = count(); } syscall::write*:entry /pid == $target && self->active == 1/ { @w[ustack()] = count(); } tick-5sec { printa(@); printf("Total calls to func: %d\n", total); printa(@w); exit(0);}' -p 7610 ; done | |
January 11, 2012 09:51:34 PM UTC | |
dtrace: description 'BEGIN ' matched 239 probes | |
CPU ID FUNCTION:NAME | |
7 69397 :tick-5sec | |
mmap 1 | |
lwp_park 20 | |
close 1655 | |
open 1669 | |
pread 8971 | |
Total calls to func: 14332 | |
January 11, 2012 09:51:41 PM UTC | |
dtrace: description 'BEGIN ' matched 239 probes | |
CPU ID FUNCTION:NAME | |
15 69397 :tick-5sec | |
lwp_park 73 | |
close 529 | |
open 536 | |
pread 2696 | |
Total calls to func: 5408 | |
[...] | |
WEIRD. There are at *most* 2K vnode operations in the vnode's | |
mailbox. Yet the eleveldb_get function is being called many *many* | |
more times than that. | |
HRM, WEIRDER, that many-thousand calls per 5 sec happens only occasionally. | |
Bimodal behavior. And it's less common than the only mode, only a | |
few hundred calls per 5 sec. | |
So ... if we use prstat during the mailbox draining of vnode ops, we | |
see this: 98-99% system time for one LWP. | |
Total: 51 processes, 777 lwps, load averages: 1.10, 1.10, 0.96 | |
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID | |
7610 riak 1.5 98 0.0 0.0 0.0 0.0 0.0 0.0 0 13 208 0 beam.smp/582 | |
So, let's see if we can create a D script that will look only at that | |
LWP. Viz. DTrace book page 80. | |
BUMMER, the sched provider isn't available inside a zone. | |
SO, it's possible to use tid == LWP# as a guard to look at only a | |
single thread. Hooray. Using "prstat -mLc" as shown above has a | |
single LWP at 98-99% system time. That LWP changes periodically: the | |
Erlang VM schedule migrates from time to time, and is prone to change | |
when the process is having DTrace probes enabled. | |
So, I'm eyeballing "prstat -mLc 1" output to catch which LWP is the | |
98% system time LWP, running probes guarded by that tid, and then | |
continue watching "prstat" output so that the same system time-eating | |
LWP remains through the entire dtrace snapshot of 2-4 seconds. Those | |
separate outputs are below ... I'll try to combine them using awk to | |
see if there's anything interesting to see when all of the runs are | |
aggregated. | |
--- | |
^C[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && d == 581/ { @[ufunc(arg1)] = count(); } tick-2s {printa(@); exit(0);}' | |
dtrace: description 'profile-997 ' matched 2 probes | |
CPU ID FUNCTION:NAME | |
1 69398 :tick-2s | |
beam.smp`mbc_realloc 1 | |
beam.smp`enc_atom 1 | |
beam.smp`erts_dsig_send_msg 1 | |
beam.smp`small_times 1 | |
eleveldb.so`_ZN7leveldb6DBImpl5WriteERKNS_12WriteOptionsEPNS_10WriteBatchE 1 | |
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 1 | |
eleveldb.so`_ZN7leveldb6crc32c6ExtendEjPKcm 1 | |
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 1 | |
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 1 | |
libumem.so.1`umem_cache_alloc 1 | |
libc.so.1`memcmp 1 | |
libc.so.1`memcpy 1 | |
libc.so.1`localtime_r 1 | |
libc.so.1`clear_lockbyte 1 | |
libc.so.1`mutex_unlock_queue 1 | |
libc.so.1`mutex_lock_impl 2 | |
beam.smp`process_main 3 | |
[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && tid == 582/ { @[ufunc(arg1)] = count(); } tick-2s {printa(@); exit(0);}' | |
dtrace: description 'profile-997 ' matched 2 probes | |
CPU ID FUNCTION:NAME | |
7 69398 :tick-2s | |
beam.smp`buf_to_intlist 1 | |
beam.smp`io_list_to_buf 1 | |
beam.smp`encode_size_struct2 1 | |
beam.smp`binary_to_term_1 1 | |
beam.smp`tag_val_def 1 | |
eleveldb.so`_ZN7leveldb6DBImpl19NewInternalIteratorERKNS_11ReadOptionsEPm 1 | |
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE18FindGreaterOrEqualERKS2_PPNS5_4NodeE 1 | |
eleveldb.so`_ZN7leveldb10TableCache11NewIteratorERKNS_11ReadOptionsEmmPPNS_5TableE 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 1 | |
eleveldb.so`_ZN7leveldb6crc32c6ExtendEjPKcm 1 | |
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 1 | |
eleveldb.so`_ZN7leveldb9ReadBlockEPNS_16RandomAccessFileERKNS_11ReadOptionsERKNS_11BlockHandleEPPNS_5BlockE 1 | |
libumem.so.1`umem_free 1 | |
libumem.so.1`vmem_xfree 1 | |
libc.so.1`strcmp 1 | |
libc.so.1`clear_lockbyte 1 | |
libc.so.1`mutex_lock_impl 1 | |
libc.so.1`mutex_unlock 1 | |
eleveldb.so`_ZNK7leveldb5Table11NewIteratorERKNS_11ReadOptionsE 2 | |
libc.so.1`memcpy 2 | |
beam.smp`process_main 4 | |
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 4 | |
[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && tid == 579/ { @[ufunc(arg1)] = count(); } tick-2s {printa(@); exit(0);}' | |
dtrace: description 'profile-997 ' matched 2 probes | |
CPU ID FUNCTION:NAME | |
2 69398 :tick-2s | |
beam.smp`mbc_alloc 1 | |
beam.smp`remote_send 1 | |
beam.smp`enc_term 1 | |
beam.smp`dsig_send 1 | |
beam.smp`big_norm 1 | |
beam.smp`erts_bs_private_append 1 | |
beam.smp`erts_bs_append 1 | |
beam.smp`tag_val_def 1 | |
beam.smp`ethr_atomic32_inc 1 | |
eleveldb.so`_ZN7leveldb10TableCache11NewIteratorERKNS_11ReadOptionsEmmPPNS_5TableE 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115MergingIteratorD0Ev 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 1 | |
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 1 | |
eleveldb.so`_ZN7leveldb9ReadBlockEPNS_16RandomAccessFileERKNS_11ReadOptionsERKNS_11BlockHandleEPPNS_5BlockE 1 | |
libc.so.1`mutex_trylock_adaptive 1 | |
libc.so.1`mutex_unlock_queue 1 | |
beam.smp`encode_size_struct2 2 | |
beam.smp`erts_get_aligned_binary_bytes_extra 2 | |
eleveldb.so`_ZN7leveldbL22GetLengthPrefixedSliceEPKc 2 | |
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 2 | |
eleveldb.so`_ZN7leveldb6crc32c6ExtendEjPKcm 2 | |
libc.so.1`pread 2 | |
libc.so.1`memcpy 3 | |
beam.smp`process_main 4 | |
[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && tid == 581/ { @[ufunc(arg1)] = count(); } tick-4s {printa(@); exit(0);}' | |
dtrace: description 'profile-997 ' matched 2 probes | |
CPU ID FUNCTION:NAME | |
9 69403 :tick-4s | |
beam.smp`tree_delete 1 | |
beam.smp`bf_unlink_free_block 1 | |
beam.smp`get_free_block 1 | |
beam.smp`and_2 1 | |
beam.smp`eq 1 | |
beam.smp`cmp 1 | |
beam.smp`atom_to_list_1 1 | |
beam.smp`encode_size_struct2 1 | |
beam.smp`dsig_send 1 | |
beam.smp`new_binary 1 | |
beam.smp`small_to_big 1 | |
beam.smp`small_times 1 | |
beam.smp`erts_whereis_name 1 | |
beam.smp`enif_get_list_cell 1 | |
libstdc++.so.6.0.14`_ZdlPv 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_16DBIter4SeekERKNS_5SliceE 1 | |
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE14KeyIsAfterNodeERKS2_PNS5_4NodeE 1 | |
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE18FindGreaterOrEqualERKS2_PPNS5_4NodeE 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115MergingIterator4SeekERKNS_5SliceE 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIterator4SeekERKNS_5SliceE 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache7ReleaseEPNS_5Cache6HandleE 1 | |
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 1 | |
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 1 | |
eleveldb.so`_ZNK7leveldb12_GLOBAL__N_121PosixRandomAccessFile4ReadEmmPNS_5SliceEPc 1 | |
libumem.so.1`malloc 1 | |
libumem.so.1`umem_depot_alloc 1 | |
libumem.so.1`vmem_getseg 1 | |
libumem.so.1`vmem_alloc 1 | |
libc.so.1`memcpy 1 | |
libc.so.1`___errno 1 | |
libc.so.1`mutex_lock_impl 1 | |
libc.so.1`cond_broadcast 1 | |
libc.so.1`pthread_self 1 | |
libumem.so.1`umem_cache_alloc 2 | |
libc.so.1`memcmp 2 | |
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 3 | |
eleveldb.so`_ZN7leveldb5Block4Iter4SeekERKNS_5SliceE 3 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache6LookupERKNS_5SliceE 5 | |
beam.smp`process_main 10 | |
^C[root@prod-2024 ~]# dtrace -n 'profile-997 /execname == "beam.smp" && arg1 && d == 585/ { @[ufunc(arg1)] = count(); } tick-4s {printa(@); exit(0);}' | |
dtrace: description 'profile-997 ' matched 2 probes | |
CPU ID FUNCTION:NAME | |
5 69403 :tick-4s | |
beam.smp`mbc_free 1 | |
beam.smp`tree_delete 1 | |
beam.smp`bf_get_free_block 1 | |
beam.smp`cmp 1 | |
beam.smp`erts_send_message 1 | |
beam.smp`schedule 1 | |
beam.smp`erts_gc_mixed_times 1 | |
beam.smp`decoded_size 1 | |
beam.smp`encode_size_struct2 1 | |
beam.smp`dec_term 1 | |
beam.smp`build_term_list 1 | |
beam.smp`erts_find_export_entry 1 | |
libstdc++.so.6.0.14`_Znam 1 | |
eleveldb.so`_ZN7leveldbL22GetLengthPrefixedSliceEPKc 1 | |
eleveldb.so`_ZNK7leveldb8SkipListIPKcNS_8MemTable13KeyComparatorEE18FindGreaterOrEqualERKS2_PPNS5_4NodeE 1 | |
eleveldb.so`_ZN7leveldbL10UnrefEntryEPvS0_ 1 | |
eleveldb.so`_ZNK7leveldb5Table11NewIteratorERKNS_11ReadOptionsE 1 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_116TwoLevelIteratorD1Ev 1 | |
eleveldb.so`_ZN7leveldb4HashEPKcmj 1 | |
eleveldb.so`_ZN7leveldb5BlockC1EPKcm 1 | |
libumem.so.1`malloc 1 | |
libumem.so.1`highbit 1 | |
libumem.so.1`umem_alloc 1 | |
libumem.so.1`umem_free 1 | |
libc.so.1`offtime_u 1 | |
libc.so.1`pread 1 | |
libc.so.1`mutex_lock 1 | |
libc.so.1`lmutex_lock 1 | |
libc.so.1`pthread_self 1 | |
beam.smp`bf_link_free_block 2 | |
beam.smp`tag_val_def 2 | |
eleveldb.so`_ZN7leveldb12_GLOBAL__N_115ShardedLRUCache7ReleaseEPNS_5Cache6HandleE 2 | |
eleveldb.so`_ZN7leveldb5Block11NewIteratorEPKNS_10ComparatorE 2 | |
eleveldb.so`_ZN7leveldb5Block4IterD0Ev 2 | |
libc.so.1`clear_lockbyte 2 | |
eleveldb.so`_ZN7leveldb5Block4Iter4SeekERKNS_5SliceE 3 | |
libc.so.1`memcmp 3 | |
libc.so.1`memcpy 3 | |
libc.so.1`mutex_lock_impl 4 | |
beam.smp`process_main 6 | |
eleveldb.so`_ZN7leveldb8FindFileERKNS_21InternalKeyComparatorERKSt6vectorIPNS_12FileMetaDataESaIS5_EERKNS_5SliceE 7 | |
WEIRD WEIRD WEIRD WEIRD WEIRD WEIRD WEIRD WEIRD | |
So, how does this work? | |
Steps taken: | |
* riak_core_node_watcher:service_up(riak_kv, whereis(riak_kv_sup)). | |
* Use etop to watch until mailbox backlog for pid 0.661.0 grows to | |
about 500. | |
* riak_core_node_watcher:service_down(riak_kv). | |
* Move mouse pointer over to "iostat -x 1" window then hit ENTER a | |
few times to help mark the point in time that I'd run service_down(). | |
* Move mouse over to "dtrace" window and run the script below. | |
* Watch the dtrace window for output @ END. | |
* Move mouse over to "iostat" window and press control-c to stop it. | |
What do I see in the "iostat" window? Zero read IOPS for the entire | |
10 second period that the dtrace script is running. Yet dtrace is | |
claiming that it has done several hundred pread(2) calls that have | |
taken 16 millseconds or longer. WEIRD. | |
dtrace -n 'BEGIN {start = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall:::entry /pid == $target && self->active == 1/ { self->start = timestamp } syscall:::return /pid == $target && self->active == 1 && self->start > 0/ { @[probefunc] = quantize((timestamp - self->start) / 1000000) ; @l[probefunc] = sum(timestamp-self->start); self->start = 0; } tick-5sec { printa(@); printf("Total calls to func: %d in %d msec wall-clock time\n", total, (timestamp - start) / 1000000); normalize(@l, 1000000); printf("Report: total milliseconds per system call (remember, in parallel!):\n"); printa(@l); exit(0);}' -p 7610 | |
dtrace: description 'BEGIN ' matched 470 probes | |
CPU ID FUNCTION:NAME | |
9 69404 :tick-10sec | |
close | |
value ------------- Distribution ------------- count | |
-1 | 0 | |
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 9 | |
1 | 0 | |
open | |
value ------------- Distribution ------------- count | |
-1 | 0 | |
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 10 | |
1 | 0 | |
pread | |
value ------------- Distribution ------------- count | |
-1 | 0 | |
0 |@@@@@@@@@@@@ 171 | |
1 | 0 | |
2 | 0 | |
4 | 0 | |
8 | 0 | |
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@ 367 | |
32 |@ 15 | |
64 | 0 | |
Total calls to func: 604 in 9863 msec wall-clock time | |
Report: total milliseconds per system call (remember, in parallel!): | |
close 0 | |
open 0 | |
pread 9611 | |
And here's the output of iostat -x 1 | egrep -v 'sd0|ramdisk', | |
starting shortly where I'd pressed ENTER to mark the time when I made | |
the transition for the riak_kv service from up -> down. | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 22.0 200.0 2816.6 10669.3 0.0 0.2 0.9 0 12 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 11.0 0.0 1282.4 0.0 0.0 0.1 6.7 0 6 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 9.0 0.0 1089.1 0.0 0.0 0.1 6.1 0 5 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 12.0 4.0 1534.9 16.0 0.0 0.1 5.2 0 8 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 6.0 44.0 768.5 5431.2 0.0 0.1 1.5 0 4 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 200.9 0.0 8396.9 0.0 0.1 0.3 0 1 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 140.0 0.0 3399.8 0.0 0.0 0.2 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 146.9 0.0 5718.7 0.0 0.0 0.3 0 1 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
extended device statistics | |
device r/s w/s kr/s kw/s wait actv svc_t %w %b | |
sd1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 | |
^C | |
--- | |
Timeline for weird-pread-latency.mov: | |
0:00 -- Upper left: DTrace window, currently idle | |
Lower left: Riak shell, currently idle | |
Upper right: iostat -x 1 | egrep -v 'sd0|ramdisk' | |
Lower right: etop output | |
0:03 -- Lower left: Enable the riak_kv service to start workload | |
0:08 -- Upper left: Get dtrace command ready | |
0:10 -- Lower right: Just watching the "MsgQ" column for a backlog | |
of about 600 messages. Once we hit that point, there's | |
about 10 seconds of work after the riak_kv service | |
is disabled (i.e. avoid new work from outside) | |
1:14 -- Lower right: etop crashed, so I restarted it | |
2:00 -- Lower right: Finally above the 600 message mark, time to go to work | |
2:02 -- Lower left: Disable riak_kv service | |
2:03 -- Upper right: press ENTER a few times to mark when the riak_kv | |
service went down | |
2:05 -- Upper left: Start dtrace | |
2:17 -- Upper left: DTrace sript has finished. | |
Upper right: Iostat reported one read operation for the entire | |
approx. 10 seconds that the D script ran. | |
--- | |
ANSWER for the puzzling pread latency problem: leveldb is reading 30MB | |
per request (an entire leveldb file), and that file is cached, so it | |
isn't generating any disk I/O. All the system time is the kernel | |
copying 30MB of data in 22-24 millisecond operations each. | |
dtrace -qn 'pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall::pread:entry /pid == $target && self->active == 1 && arg2 > 1000000/ { printf("pread %d bytes offset %d fd %d path %s\n", arg2, arg3, arg0, fds[arg0].fi_pathname); }' -p 7610 | |
Bummer, apparently fds[] array isn't available in a zone? There is no | |
output from this script except this: | |
dtrace: error on enabled probe ID 3 (ID 318: syscall::pread:entry): invalid kernel access in action #4 at DIF offset 64 | |
dtrace: error on enabled probe ID 3 (ID 318: syscall::pread:entry): invalid kernel access in action #4 at DIF offset 64 | |
Fine. So, we'll modify it a bit to provide a timestamp and omit the path. | |
dtrace -qn 'BEGIN { start = timestamp; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall::pread:entry /pid == $target && self->active == 1 && arg2 > 1000000/ { printf("time %d pread %d bytes offset %d fd %d\n", (timestamp - start) / 1000000000, arg2, arg3, arg0); }' -p 7610 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
time 3 pread 30929259 bytes offset 311941 fd 4397 | |
[...] | |
dtrace -qn 'pid$target::_ZN7leveldb5SliceC1EPKcm:entry {this->size = (int) arg22; printf("size %d\n", this->size);}' | |
dtrace -n 'BEGIN { start = timestamp; } pid$target::_ZN7leveldb5SliceC1EPKcm:entry { self->data = arg1; self->size = (arg2 > 0) ? arg2 : 1; } pid$target::eleveldb_get:entry {self->active = 1; total++;} pid$target::eleveldb_get:return {self->active = 0;} syscall::pread:entry /pid == $target && self->active == 1 && arg2 > 50000/ { printf("time %d pread %d bytes offset %d fd %d self->size %d\n", (timestamp - start) / 1000000000, arg2, arg3, arg0, self->size); tracemem(copyin(self->data, 64), 64); }' -p 14922 | |
11 318 pread:entry time 6 pread 62177 bytes offset 1427179 fd 2163 self->size 53 | |
0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef | |
0: 10 00 00 00 03 0c b7 80 08 12 b1 5b ec 96 9b 2d ...........[...- | |
10: ce bf 72 b0 dd c0 08 12 98 cc e6 53 39 ac de 73 ..r........S9..s | |
20: 34 9c cc 26 33 09 cd 7e 63 34 9a cd 27 33 99 a4 4..&3..~c4..'3.. | |
30: da 69 35 00 08 66 20 08 02 00 00 00 00 83 44 04 .i5..f .......D. | |
([email protected])13> sext:decode(<<16#10, 16#00, 16#00, 16#00, 16#03, 16#0c, 16#b7, 16#80, 16#08, 16#12, 16#b1, 16#5b, 16#ec, 16#96, 16#9b, 16#2d,16#ce, 16#bf, 16#72, 16#b0, 16#dd, 16#c0, 16#08, 16#12, 16#98, 16#cc, 16#e6, 16#53, 16#39, 16#ac, 16#de, 16#73,16#34, 16#9c, 16#cc, 16#26, 16#33, 16#09, 16#cd, 16#7e, 16#63, 16#34, 16#9a, 16#cd, 16#27, 16#33, 16#99, 16#a4,16#da, 16#69, 16#35, 16#00, 16#08>>). | |
{o,<<"bodies_raw">>,<<"1323579490109_1454994645">>} | |
--- | |
Later in the same day, Thu Jan 12, 6:21pm CST | |
Hrm. Well, what's a person to do? I wanted to do more investigation | |
with DTrace and eleveldb.so, but I wanted a shared lib that had pid | |
provider probes for class constructors, e.g. leveldb::Slice class. So | |
I copied the eleveldb.so shared lib from prod-2000 to prod-2024, | |
stopped Riak, then restarted to use the new shared lib. | |
What I forgot was that the restart would start riak_kv service and | |
leave it running. Duh! The weird/crazy/but-also-nice thing is that | |
LevelDB apparently wrote enough data that it triggered a merge (?), | |
and now the problem SST file is gone. *And* the latency problem has | |
disappeared. | |
I managed to make a copy of the allegedly-bad SST file, it's in the | |
tarball bad-latency.840833.sst.tar.gz. | |
-rw-r--r-- 0 riak riak 31242320 Jan 6 02:07 var/db/riak/leveldb/813531184842397132203223002586212227738165575680/840833.sst | |
... try to reproduce, starting from empty data dirs, Riak + eleveldb backend: | |
{ok, C} = riak:local_client(). | |
[C:put(riak_object:new(<<"abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd">>, <<"qwerqwerqwerqwerqwerqwerqwerqwer">>, <<"valvalvalvalvalvalvalvalvalvalvalval">>)) || _ <- lists:seq(1,11*1000)]. | |
... will create 3 .sst files. Then clobber those 3 sst files with the | |
bad one from XXXer. .. Bah, that doesn't work. | |
--- | |
Run from sbb: | |
#!/bin/sh | |
for i in {0,1,2,3}{0,1,2,3,4,5,6,7,8,9} 40; do | |
ssh -n -p 2222 root@prod-20$i.XXXer.com "/usr/bin/time curl http://prod-20$i.XXXer.com:8098/stats 2>&1 | awk '/real/ { print \"prod-20$i\", \$0}' " & | |
done | sort -nr -k 3 | |
--- | |
Ring partition ownership helper functions: | |
Fowned_partitions = fun(Owners, Node) -> [P || {P, Owner} <- Owners, Owner =:= Node] end. | |
Fvnode_pids = fun(Node) -> [Pid || {_,Pid,_,_} <- supervisor:which_children({riak_core_vnode_sup, Node})] end. | |
Frunning_vnodes = fun(Node) -> Pids = Fvnode_pids(Node), [rpc:call(Node, riak_core_vnode, get_mod_index, [Pid], 30000) || Pid <- Pids] end. | |
Factive_partitions = fun(Node) -> lists:foldl(fun({_,P}, Ps) -> ordsets:add_element(P, Ps) end, [], Frunning_vnodes(Node)) end. | |
Fpartitions = fun(Node, Ring) -> Owners = riak_core_ring:all_owners(Ring), Owned = ordsets:from_list(Fowned_partitions(Owners, Node)), Active = ordsets:from_list(Factive_partitions(Node)), Stopped = ordsets:subtract(Owned, Active), Secondary = ordsets:subtract(Active, Owned), Primary = ordsets:subtract(Active, Secondary), {Primary, Secondary, Stopped} end. | |
{ok, Ring1} = riak_core_ring_manager:get_my_ring(). | |
{ok, RRing1} = riak_core_ring_manager:get_raw_ring(). | |
([email protected])33> riak_core_ring:index_owner(Ring1, 1444374665018431399630985401082889078019339583488). | |
'[email protected]' | |
([email protected])34> riak_core_ring:index_owner(Ring1, 552344857077206473969556670176954617780122943488). | |
'[email protected]' | |
[Primary1, Secondary1, Stopped1] = Fpartitions(node(), Ring1). | |
([email protected])57> lists:usort([riak_core_ring:next_owner(Ring1, II) || II <- Primary1 ++ Secondary1]). | |
[{undefined,undefined,undefined}] | |
Hmmmm, that isn't good, I don't believe: next_owner() is doing most of | |
the heavy lifting for riak_core_vnode:should_handoff(). | |
{ok, RRing1} = riak_core_ring_manager:get_raw_ring(). | |
element(7, RRing1). %% element 7 = #chstate_v2.next list | |
[] | |
---------- | |
Interruption: error_logger <0.6.0> on prod-2000 is apparently hung. | |
([email protected])19> io:format("~s\n", [element(2,process_info(pid(0,6,0), backtrace))]). | |
Program counter: 0x00000000030bb790 (gen:do_call/4 + 576) | |
CP: 0x0000000000000000 (invalid) | |
arity = 0 | |
0xfffffd7fe59ec4c0 Return addr 0x00000000030c32d0 (gen_event:rpc/2 + 96) | |
y(0) #Ref<0.0.76.177386> | |
y(1) '[email protected]' | |
y(2) [] | |
y(3) infinity | |
y(4) {sync_notify,{log,3,{["2012",45,["0",49],45,"14"],["16",58,"24",58,["0",50],46,"534"]},[["[","error","] "],["<0.22085.415>",32],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}} | |
y(5) <0.6.0> | |
y(6) <0.41.0> | |
0xfffffd7fe59ec500 Return addr 0x0000000004777e18 (error_logger_lager_h:handle_event/2 + 6072) | |
0xfffffd7fe59ec508 Return addr 0x00000000030c73c0 (gen_event:server_update/4 + 272) | |
y(0) [] | |
y(1) [] | |
y(2) [] | |
y(3) [] | |
y(4) [] | |
y(5) [] | |
y(6) {} | |
0xfffffd7fe59ec548 Return addr 0x00000000030c6fa8 (gen_event:server_notify/4 + 136) | |
y(0) error_logger | |
y(1) {error,<0.89.0>,{<0.22085.415>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}} | |
y(2) {handler,error_logger_lager_h,false,{},<0.47.0>} | |
y(3) {} | |
y(4) error_logger_lager_h | |
y(5) Catch 0x00000000030c73c0 (gen_event:server_update/4 + 272) | |
0xfffffd7fe59ec580 Return addr 0x00000000030c3ea0 (gen_event:handle_msg/5 + 256) | |
y(0) error_logger | |
y(1) handle_event | |
y(2) {error,<0.89.0>,{<0.22085.415>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}} | |
y(3) [] | |
0xfffffd7fe59ec5a8 Return addr 0x00000000030de690 (proc_lib:init_p_do_apply/3 + 56) | |
y(0) [] | |
y(1) error_logger | |
y(2) <0.2.0> | |
0xfffffd7fe59ec5c8 Return addr 0x00000000006df368 (<terminate process normally>) | |
y(0) Catch 0x00000000030de6b0 (proc_lib:init_p_do_apply/3 + 88) | |
ok | |
([email protected])20> io:format("~s\n", [element(2,process_info(pid(0,41,0), backtrace))]). | |
Program counter: 0x0000000004782d88 (io:wait_io_mon_reply/2 + 56) | |
CP: 0x0000000000000000 (invalid) | |
arity = 0 | |
0xfffffd7ff73b8aa0 Return addr 0x00000000047806d8 (io:o_request/3 + 96) | |
y(0) #Ref<0.0.76.177389> | |
y(1) <0.37.0> | |
0xfffffd7ff73b8ab8 Return addr 0x00000000046fe798 (lager_console_backend:handle_event/2 + 752) | |
y(0) [] | |
y(1) [] | |
y(2) put_chars | |
y(3) {put_chars,unicode,[["16",58,"24",58,["0",50],46,"534"]," ",["[","error","] "],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}","\n"]} | |
y(4) <0.37.0> | |
0xfffffd7ff73b8ae8 Return addr 0x00000000030c73c0 (gen_event:server_update/4 + 272) | |
y(0) {state,6,false} | |
0xfffffd7ff73b8af8 Return addr 0x00000000030c6fa8 (gen_event:server_notify/4 + 136) | |
y(0) lager_event | |
y(1) {log,3,{["2012",45,["0",49],45,"14"],["16",58,"24",58,["0",50],46,"534"]},[["[","error","] "],["<0.22085.415>",32],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]} | |
y(2) {handler,lager_console_backend,false,{state,6,false},<0.44.0>} | |
y(3) {state,6,false} | |
y(4) lager_console_backend | |
y(5) Catch 0x00000000030c73c0 (gen_event:server_update/4 + 272) | |
0xfffffd7ff73b8b30 Return addr 0x00000000030c7088 (gen_event:server_notify/4 + 360) | |
y(0) lager_event | |
y(1) handle_event | |
y(2) {log,3,{["2012",45,["0",49],45,"14"],["16",58,"24",58,["0",50],46,"534"]},[["[","error","] "],["<0.22085.415>",32],"webmachine error: path=\"/riak/bodies_raw/1326326952460_e24032fb530dcd8a_ba15cfa417c2a5a2\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]} | |
y(3) [] | |
0xfffffd7ff73b8b58 Return addr 0x00000000030c4008 (gen_event:handle_msg/5 + 616) | |
y(0) {handler,lager_file_backend,false,{state,[{file,"/var/log/riak/error.log",3,{file_descriptor,prim_file,{#Port<0.1773>,11}},880192,false,10485760,[{hour,0}],5},{file,"/var/log/riak/console.log",6,{file_descriptor,prim_file,{#Port<0.1774>,12}},880193,false,10485760,[{hour,0}],5}]},<0.45.0>} | |
0xfffffd7ff73b8b68 Return addr 0x00000000030de690 (proc_lib:init_p_do_apply/3 + 56) | |
y(0) [] | |
y(1) lager_event | |
y(2) <0.40.0> | |
y(3) {<0.6.0>,#Ref<0.0.76.177386>} | |
y(4) <0.6.0> | |
0xfffffd7ff73b8b98 Return addr 0x00000000006df368 (<terminate process normally>) | |
y(0) Catch 0x00000000030de6b0 (proc_lib:init_p_do_apply/3 + 88) | |
ok | |
NOTE: We've been waiting on the same ref, #Ref<0.0.76.177389>, for | |
many minutes now. :-( | |
This may be a DTrace-related I/O problem (though no Erlang VM probes | |
are enabled that I know of). Or not. I can't tell. | |
UPDATE: See below for more info. | |
--- | |
OK, on to trying to see how many LevelDB calls might be executing | |
simultaneously. | |
dtrace -qn 'BEGIN { highest = 0; cnt = 0; } pid$target:eleveldb.so:eleveldb_*:entry { cnt++ ; highest = (cnt > highest) ? cnt : highest; self->start = timestamp; } pid$target:eleveldb.so:eleveldb_*:return /self->start > 0/ { cnt = (cnt < 1) ? 0 : (cnt -1); @ = quantize((timestamp - self->start) / 1000000); self->start = 0;} tick-5sec { printf("Highest is %d\n", highest); highest = 0;} tick-30sec { cnt = 0; /*printf("Breakdown in milliseconds\n"); printa(@);*/ trunc(@);} ' -p 14922 | |
--- | |
"Bad SST" ... may not be bad, but stupid app asking for the same | |
big-ass key, {<<"bodies_raw">>, <<"key">>}, zillions of times. | |
{_, M} = process_info(pid(0,544,0), messages). | |
Ks = lists:sort([element(2, element(4, element(2, GE))) || GE <- M, element(1, element(4, element(2, GE))) == riak_kv_get_req_v1]). | |
Weird = [BK || {<<"bodies_raw">>, <<"key">>} = BK <- Ks]. | |
([email protected])27> length(Weird). | |
30590 | |
And that key is 50MB. | |
([email protected])9> file:write_file("/tmp/bad-sst-but-really-big-bkey-get.bin", term_to_binary(M)). | |
---- | |
([email protected])1> {ok, Ref} = eleveldb:open("/var/db/riak/leveldb/812103937149691172322164716616762732601782829056", []). | |
{ok,<<>>} | |
([email protected])3> X = eleveldb:fold_keys(Ref, fun(X, Acc) -> [X|Acc] end, [], []). | |
([email protected])4> file:write_file("/var/tmp/keys-just-in-case.bin", term_to_binary(X)). | |
ok | |
([email protected])5> length(X). | |
3864400 | |
([email protected])6> sext:decode(hd(X)). | |
{o,<<"messages">>, | |
<<"zyana.casillas.fb.100002111391145:contacts">>} | |
fun() -> | |
{ok, Ref} = eleveldb:open("/var/db/riak/leveldb/812103937149691172322164716616762732601782829056", []), | |
{ok, FH} = file:open("/var/tmp/812103937149691172322164716616762732601782829056.slfhackdump", [write, binary, raw, compressed]), | |
WriteOne = fun(X, {FH2, Count}) -> | |
XBin = term_to_binary(X), | |
XBinSize = size(XBin), | |
file:write(FH2, [<<XBinSize:32>>, XBin]), | |
if Count /= 100 -> ok; true -> error_logger:info_msg("Hack at ~p, TEST IT!\n", [Count]), file:close(FH2), exit(normal) end, | |
if Count rem 10000 == 0 -> error_logger:info_msg("Hack dump wrote ~p records\n", [Count]); true -> ok end, | |
{FH2, Count+1} | |
end, | |
{_, Final} = eleveldb:fold_keys(Ref, WriteOne, {FH, 1}, []), | |
error_logger:info_msg("Hack dump final: wrote ~p records\n", [Final]), | |
exit(normal) | |
end. | |
--- | |
dtrace -qn 'BEGIN { highest = 0; cnt = 0; } pid$target:eleveldb.so:eleveldb_*:entry,pid$target:eleveldb.so:*DoCompactionWork*:entry { cnt++ ; highest = (cnt > highest) ? cnt : highest; } pid$target:eleveldb.so:eleveldb_*:return,pid$target:eleveldb.so:*DoCompactionWork*:return { cnt = (cnt < 1) ? 0 : (cnt -1); } tick-5sec { printf("Highest is %d\n", highest); highest = 0;} tick-30sec { printf("reset\n"); cnt = 0;} ' -p 23271 | |
Highest is 9 | |
Highest is 8 | |
Highest is 7 | |
Highest is 10 | |
Highest is 10 | |
[...] | |
dtrace -qn 'pid$target:eleveldb.so:*DoCompactionWork*:entry {self->start = timestamp} pid$target:eleveldb.so:*DoCompactionWork*:return { printf("compaction %d msec\n", timestamp - self->start);}' -p 23271 | |
... Hm hrm hmmmm very interesting, see 2011-12-07.XXXer.data/slf.sh and | |
slf.d implementation. | |
prod-2013 compaction 3445242949531516 -> 3445579140520676 = 336190 msec | |
prod-2016 compaction 3444935493726522 -> 3445264709366428 = 329215 msec | |
prod-2017 compaction 3445574248440866 -> 3445866808284379 = 292559 msec | |
prod-2016 compaction 3446367501174857 -> 3446645210490365 = 277709 msec | |
prod-2004 compaction 3446351606124390 -> 3446569990225495 = 218384 msec | |
prod-2009 compaction 3447014449109281 -> 3447087977597887 = 73528 msec | |
prod-2010 compaction 3446561322573171 -> 3446630495727881 = 69173 msec | |
prod-2009 compaction 3444289135231845 -> 3444357866368973 = 68731 msec | |
prod-2024 compaction 3218058352359332 -> 3218126783109779 = 68430 msec | |
prod-2002 compaction 3445177206907731 -> 3445244794304685 = 67587 msec | |
prod-2002 compaction 3446004235060663 -> 3446065053403479 = 60818 msec | |
prod-2011 compaction 3446845235168486 -> 3446904391635234 = 59156 msec | |
prod-2016 compaction 3444818206254735 -> 3444877176239571 = 58969 msec | |
prod-2006 compaction 3445691300975842 -> 3445747795173234 = 56494 msec | |
prod-2006 compaction 3447533560070312 -> 3447588734869068 = 55174 msec | |
prod-2000 compaction 3447230625813014 -> 3447284372201923 = 53746 msec | |
prod-2013 compaction 3446347422456985 -> 3446398488149603 = 51065 msec | |
prod-2002 compaction 3446112111566514 -> 3446160085767080 = 47974 msec | |
prod-2000 compaction 3446089380507528 -> 3446129998030687 = 40617 msec | |
prod-2009 compaction 3443726676664708 -> 3443766386818881 = 39710 msec | |
prod-2017 compaction 3443354706109044 -> 3443391235674135 = 36529 msec | |
prod-2006 compaction 3445627567244311 -> 3445658942845795 = 31375 msec | |
prod-2009 compaction 3446209955030151 -> 3446239145536200 = 29190 msec | |
prod-2020 compaction 3218239502629725 -> 3218266849931259 = 27347 msec | |
--- | |
Notes on the plane (see notebook for originals) | |
* Eumerate _observed_ problems with Joyent cluster: | |
1. ** Node '[email protected]' not responding ** | |
NOTE: This happened 20 times to 16 unique machines as observed | |
by prod-2039 on 15-Jan-2012 | |
2. Slow HTTP queries to /stats and (!) /ping URIs | |
NOTE: This happens even when curl client is on localhost, so no | |
network effects (e.g., packet loss) is a factor | |
3. eleveldb interference during startup monster merges, e.g. wee hours | |
of Sunday 15-Jan-2012. | |
* Investigate what all a HTTP /stats query does: how many Erlang | |
processes does the service communicate with? Though we also see | |
latency problems with /ping URI, I've seen curl://localhost:8098/stats | |
queries take over 60 seconds. | |
* We don't have any meaningful network stats history. XXXer hasn't | |
set up Cacti or equivalent yet. So I believe we need to monitor | |
this ourselves on each node and keep a history, using data from | |
"netstat -s" and polled on a very frequent basis, e.g. 5 seconds. | |
* Would it be helpful to have a D script to watch the internal VM | |
functions for sending & receiving distributed Erlang messages? This | |
could also confirm what "not responding" errors cluster-wide claim: | |
that the complained-about node takes a nap/goes on holiday for at | |
least 60 seconds and sometimes for several minutes. | |
- We still don't know if this problem is related to network problems | |
(e.g., packet loss), NIF interference with Erlang scheudler threads, | |
both, or neither. | |
* Has Matt & crew found the app code path that's generating the | |
bizillion requests for the bogus 30-50MB {<<"bodies_raw">>, | |
<<"key">>} bkey? | |
- If not, then how about patching get & put FSM's to look for that | |
!@#$! bkey and complain whenever they see it? Or ignore any op | |
with that key? Or return a HTTP 972 CRAZY REPLY to provoke their | |
app into crashing ASAP? | |
* OT: create D to decode Erlang empty list, atom | |
* OT: resolve R15B source merge conflict | |
* OT: pester CDL about network monitoring best practices | |
---- | |
Output from slf.d: | |
January 17, 2012 05:38:18 AM UTC | |
prod-2017 tid 593 compaction 3453448176397581 -> 3453774276161487 = 326099 msec | |
Report from prod-2000 (and all others, though not included here): | |
2012-01-17 05:36:56.216 ...** Node '[email protected]' not responding ** | |
2012-01-17 05:36:56.492 ...net_kernel event: {nodedown,'[email protected]',[{nodedown_reason,net_tick_timeout},{node_type,visible}]} | |
2012-01-17 05:38:58.641 ...net_kernel event: {nodeup,'[email protected]',[{node_type,visible}]} | |
January 17, 2012 05:30:18 AM UTC | |
prod-2014 tid 594 compaction 3453278623074909 -> 3453568805849073 = 290182 msec | |
Report from prod-2000 (and all others, though not included here): | |
2012-01-17 05:29:24.449 ...** Node '[email protected]' not responding ** | |
2012-01-17 05:29:24.678 ...net_kernel event: {nodedown,'[email protected]',[{nodedown_reason,net_tick_timeout},{node_type,visible}]} | |
2012-01-17 05:30:29.045 ...net_kernel event: {nodeup,'[email protected]',[{node_type,visible}]} | |
January 17, 2012 05:40:18 AM UTC | |
prod-2012 tid 593 compaction 3454206245670796 -> 3454284912945709 = 78667 msec | |
No nodedown events reported | |
January 17, 2012 05:00:18 AM UTC | |
prod-2010 tid 593 compaction 3451910297021077 -> 3451988140075538 = 77843 msec | |
No nodedown events reported | |
--- | |
2012-01-18 Wednesday | |
On the LevelDB front, we appear to have found two different bugs: | |
* A single compaction op can "Expand" to add thousands of extra files | |
* While trying to recreate the above bug, found another apparent bug | |
where a single eleveldb put trigger a cascade of 3-file (2-file?) | |
compaction, dozens or hundreds of them, that blocks the put for 30+ | |
seconds. Each little compaction appears to have the same number of | |
files. | |
---- | |
More on the error-logger-is-blocked front. | |
My reading of this is that the lager_event process, <0.43.0>, is | |
blocked waiting for a reply from its group leader, <0.39.0>. The | |
group leader's mailbox is empty, so what happened? Did the | |
{io_request, ...} message never arrive at the group leader, or did | |
the group leader's reply never arrive back? In either case, how | |
could such intra-machine messages ever get lost?? | |
([email protected])5> lists:sort([{Size, Pid} | |
([email protected])5> || Pid <- processes(), | |
([email protected])5> {message_queue_len, Size} <- | |
([email protected])5> [catch process_info(Pid, message_queue_len)], | |
([email protected])5> Size > 0]). | |
[{1,<0.32.0>},{16,<0.43.0>},{261,<0.89.0>},{2856,<0.5.0>}] | |
([email protected])6> | |
([email protected])6> | |
([email protected])6> | |
([email protected])6> io:format("~s\n", [element(2,process_info(pid(0,5,0), backtrace))]). | |
Program counter: 0x00000000030aa5a0 (gen:do_call/4 + 576) | |
CP: 0x0000000000000000 (invalid) | |
arity = 0 | |
0xfffffd7fd0445600 Return addr 0x00000000030b6a18 (gen_event:rpc/2 + 96) | |
y(0) #Ref<0.0.22.18473> | |
y(1) '[email protected]' | |
y(2) [] | |
y(3) infinity | |
y(4) {sync_notify,{log,3,{["2012",45,["0",49],45,"19"],[["0",49],58,"31",58,"32",46,["0",50,50]]},[["[","error","] "],["<0.19513.124>",32],"webmachine error: path=\"/riak/bodies_raw/1326936634921_0291564814_f691cea\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}} | |
y(5) <0.5.0> | |
y(6) <0.43.0> | |
0xfffffd7fd0445640 Return addr 0x0000000004781518 (error_logger_lager_h:handle_event/2 + 6072) | |
0xfffffd7fd0445648 Return addr 0x00000000030bab08 (gen_event:server_update/4 + 272) | |
y(0) [] | |
y(1) [] | |
y(2) [] | |
y(3) [] | |
y(4) [] | |
y(5) [] | |
y(6) {} | |
0xfffffd7fd0445688 Return addr 0x00000000030ba6f0 (gen_event:server_notify/4 + 136) | |
y(0) error_logger | |
y(1) {error,<0.91.0>,{<0.19513.124>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326936634921_0291564814_f691cea",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}} | |
y(2) {handler,error_logger_lager_h,false,{},<0.49.0>} | |
y(3) {} | |
y(4) error_logger_lager_h | |
y(5) Catch 0x00000000030bab08 (gen_event:server_update/4 + 272) | |
0xfffffd7fd04456c0 Return addr 0x00000000030b75e8 (gen_event:handle_msg/5 + 256) | |
y(0) error_logger | |
y(1) handle_event | |
y(2) {error,<0.91.0>,{<0.19513.124>,"webmachine error: path=~p~n~p~n",["/riak/bodies_raw/1326936634921_0291564814_f691cea",{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}]}} | |
y(3) [] | |
0xfffffd7fd04456e8 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56) | |
y(0) [] | |
y(1) error_logger | |
y(2) <0.2.0> | |
0xfffffd7fd0445708 Return addr 0x00000000006df368 (<terminate process normally>) | |
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88) | |
ok | |
([email protected])7> process_info(pid(054,0)). | |
** exception error: undefined shell command pid/2 | |
([email protected])8> process_info(pid(0,5,0)). | |
[{registered_name,error_logger}, | |
{current_function,{gen,do_call,4}}, | |
{initial_call,{proc_lib,init_p,5}}, | |
{status,waiting}, | |
{message_queue_len,2858}, | |
{messages,[{notify,{error,<0.91.0>, | |
{<0.30723.124>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936665946_0712657087_890c94f", | |
{error,{exit,normal, | |
[{webmachine_request,read_chunk_length,3}, | |
{webmachine_request,recv_chunked_body,3}, | |
{webmachine_request,read_whole_stream,...}, | |
{webmachine_request,...}, | |
{...}|...]}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.31283.124>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326927019020_4817631435_19183809", | |
{error,{exit,normal, | |
[{webmachine_request,read_chunk_length,3}, | |
{webmachine_request,recv_chunked_body,...}, | |
{webmachine_request,...}, | |
{...}|...]}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.27205.124>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936639145_0eefb4d7ced50272_d431aae4335c28d6", | |
{error,{exit,normal, | |
[{webmachine_request,read_chunk_length,...}, | |
{webmachine_request,...}, | |
{...}|...]}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.19987.125>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936749141_2996738267", | |
{error,{exit,normal, | |
[{webmachine_request,...},{...}|...]}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.20989.125>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936319977_0755845112_1753401", | |
{error,{exit,normal,[{...}|...]}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.18515.121>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936525750_0871077864_f2af806", | |
{error,{exit,normal,[...]}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.24401.125>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936763467_1191873287", | |
{error,{exit,normal,...}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.7968.126>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/lrg_1326930983572_279395581_7b1d173f", | |
{error,{error,...}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.7101.126>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936457880_bab39ccd82aa0131_da56ea4b2b8643b3", | |
{error,{...}}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.21435.126>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936030608_c540d7f481f91100_ccd676e4dabd9b62", | |
{error,...}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.1781.127>,"webmachine error: path=~p~n~p~n", | |
["/riak/bodies_raw/1326936083601_0249619846_0a82470", | |
{...}]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.20431.127>,"webmachine error: path=~p~n~p~n", | |
[[...]|...]}}}, | |
{notify,{error,<0.91.0>, | |
{<0.28118.127>,"webmachine error: path=~p~n~p~n",[...]}}}, | |
{notify,{error,<0.91.0>,{<0.4580.128>,[...],...}}}, | |
{notify,{error,<0.91.0>,{<0.11898.128>,...}}}, | |
{notify,{error,<0.91.0>,{...}}}, | |
{notify,{error,<0.91.0>,...}}, | |
{notify,{error,...}}, | |
{notify,{...}}, | |
{notify,...}, | |
{...}|...]}, | |
{links,[<0.32.0>,<0.49.0>,<0.0.0>]}, | |
{dictionary,[{'$ancestors',[<0.2.0>]}, | |
{'$initial_call',{gen_event,init_it,6}}]}, | |
{trap_exit,true}, | |
{error_handler,error_handler}, | |
{priority,normal}, | |
{group_leader,<0.32.0>}, | |
{total_heap_size,1346269}, | |
{heap_size,1346269}, | |
{stack_size,35}, | |
{reductions,6818201}, | |
{garbage_collection,[{min_bin_vheap_size,46368}, | |
{min_heap_size,233}, | |
{fullsweep_after,0}, | |
{minor_gcs,0}]}, | |
{suspending,[]}] | |
([email protected])9> process_info(pid(0,43,0)). | |
[{registered_name,lager_event}, | |
{current_function,{io,wait_io_mon_reply,2}}, | |
{initial_call,{proc_lib,init_p,5}}, | |
{status,waiting}, | |
{message_queue_len,16}, | |
{messages,[{<0.11163.124>, | |
{<0.11163.124>,#Ref<0.0.22.3946>}, | |
{sync_notify,{log,6, | |
{["2012",45,["0",49],45,"19"], | |
[["0",49],58,"31",58,"17",46,"799"]}, | |
[["[","info","] "], | |
["<0.11163.124>",64,"riak_core_handoff_sender",58, | |
"start_fold",58,"39",32], | |
"Starting handoff of partition riak_kv_vnode 1084708246456529509604297336781616303650887434240 from '[email protected]' to '[email protected]'"]}}}, | |
{<0.11237.124>, | |
{<0.11237.124>,#Ref<0.0.22.5130>}, | |
{sync_notify,{log,6, | |
{["2012",45,["0",49],45,"19"], | |
[["0",49],58,"31",58,"18",46,["0",50|...]]}, | |
[["[","info","] "], | |
["<0.11237.124>",64,"riak_core_handoff_receiver",58, | |
"process_message",58,"101",32], | |
"Receiving handoff data for partition riak_kv_vnode:1113253200310648707225463056170606206378542366720"]}}}, | |
{<0.11241.124>, | |
{<0.11241.124>,#Ref<0.0.22.5131>}, | |
{sync_notify,{log,6, | |
{["2012",45,["0",49],45,"19"], | |
[["0",49],58,"31",58,"18",46,[[...]|...]]}, | |
[["[","info","] "], | |
["<0.11241.124>",64,"riak_core_handoff_receiver",58, | |
"process_message",58, | |
[...]|...], | |
"Receiving handoff data for partition riak_kv_vnode:1076144760300293750317947620964919332832590954496"]}}}, | |
{<0.11289.124>, | |
{<0.11289.124>,#Ref<0.0.22.5202>}, | |
{sync_notify,{log,6, | |
{["2012",45,["0",49],45,"19"], | |
[["0",49],58,"31",58,"18",46,[...]]}, | |
[["[","info","] "], | |
["<0.11289.124>",64,"riak_core_handoff_receiver",58, | |
"process_message",58|...], | |
"Receiving handoff data for partition riak_kv_vnode:1001927880279583836502916750553545585740688130048"]}}}, | |
{<0.11293.124>, | |
{<0.11293.124>,#Ref<0.0.22.5204>}, | |
{sync_notify,{log,6, | |
{["2012",45,["0",49],45,"19"], | |
[["0",49],58,"31",58,"18",46|...]}, | |
[["[","info","] "], | |
["<0.11293.124>",64,"riak_core_handoff_receiver",58, | |
[...]|...], | |
"Receiving handoff data for partition riak_kv_vnode:667951920186389224335277833702363723827125420032"]}}}, | |
{<0.17488.124>, | |
{<0.17488.124>,#Ref<0.0.22.13770>}, | |
{sync_notify,{log,6, | |
{["2012",45,["0",49],45,"19"], | |
[["0",49],58,"31",58,[...]|...]}, | |
[["[","info","] "], | |
["<0.17488.124>",64,"riak_core_handoff_receiver",58|...], | |
"Receiving handoff data for partition riak_kv_vnode:1447229160403843319393101973021788068292105076736"]}}}, | |
{<0.5.0>, | |
{<0.5.0>,#Ref<0.0.22.18473>}, | |
{sync_notify,{log,3, | |
{["2012",45,["0",49],45,[...]],[["0",49],58,"31",58|...]}, | |
[["[","error","] "], | |
["<0.19513.124>",32], | |
"webmachine error: path=\"/riak/bodies_raw/1326936634921_0291564814_f691cea\"\n{error,{exit,normal,[{webmachine_request,read_chunk_length,3},{webmachine_request,recv_chunked_body,3},{webmachine_request,read_whole_stream,5},{webmachine_request,call,2},{wrq,req_body,1},{riak_kv_wm_object,accept_doc_body,2},{webmachine_resource,resource_call,3},{webmachine_resource,do,3}]}}"]}}}, | |
{<0.24660.124>, | |
{<0.24660.124>,#Ref<0.0.22.24522>}, | |
{sync_notify,{log,6, | |
{["2012",45,[[...]|...],45|...],[["0",49],58,[...]|...]}, | |
[["[","info",[...]], | |
["<0.24660.124>",64|...], | |
"Receiving handoff data for partition riak_kv_vnode:593735040165679310520246963290989976735222595584"]}}}, | |
{<0.30213.125>, | |
{<0.30213.125>,#Ref<0.0.22.84272>}, | |
{sync_notify,{log,6, | |
{["2012",45,[...]|...],[[[...]|...],58|...]}, | |
[["[",[...]|...],[[...]|...],[...]]}}}, | |
{<0.2254.127>, | |
{<0.2254.127>,#Ref<0.0.22.142846>}, | |
{sync_notify,{log,6, | |
{["2012",45|...],[[...]|...]}, | |
[[[...]|...],[...]|...]}}}, | |
{<0.89.0>, | |
{<0.89.0>,#Ref<0.0.24.65121>}, | |
{sync_notify,{log,6,{[[...]|...],[...]},[[...]|...]}}}, | |
{<0.29908.243>, | |
{<0.29908.243>,#Ref<0.0.44.234047>}, | |
{sync_notify,{log,6,{[...],...},[...]}}}, | |
{<0.10551.244>, | |
{<0.10551.244>,#Ref<0.0.44.253694>}, | |
{sync_notify,{log,6,{...},...}}}, | |
{<0.1660.325>, | |
{<0.1660.325>,#Ref<0.0.60.172925>}, | |
{sync_notify,{log,6,...}}}, | |
{<0.11720.325>, | |
{<0.11720.325>,#Ref<0.0.60.189769>}, | |
{sync_notify,{log,...}}}, | |
{<0.8324.340>, | |
{<0.8324.340>,#Ref<0.0.63.180000>}, | |
{sync_notify,{...}}}]}, | |
{links,[<0.46.0>,<0.47.0>,#Port<0.1755082>,<0.42.0>, | |
#Port<0.1755079>]}, | |
{dictionary,[{'$ancestors',[lager_sup,<0.40.0>]}, | |
{'$initial_call',{gen_event,init_it,6}}]}, | |
{trap_exit,true}, | |
{error_handler,error_handler}, | |
{priority,normal}, | |
{group_leader,<0.39.0>}, | |
{total_heap_size,10946}, | |
{heap_size,10946}, | |
{stack_size,33}, | |
{reductions,975453}, | |
{garbage_collection,[{min_bin_vheap_size,46368}, | |
{min_heap_size,233}, | |
{fullsweep_after,0}, | |
{minor_gcs,0}]}, | |
{suspending,[]}] | |
([email protected])10> | |
([email protected])10> | |
([email protected])10> io:format("~s\n", [element(2,process_info(pid(0,43,0), backtrace))]). | |
Program counter: 0x0000000004773650 (io:wait_io_mon_reply/2 + 56) | |
CP: 0x0000000000000000 (invalid) | |
arity = 0 | |
0xfffffd7ffad53d58 Return addr 0x0000000004770fa0 (io:o_request/3 + 96) | |
y(0) #Ref<0.0.22.3921> | |
y(1) <0.39.0> | |
0xfffffd7ffad53d70 Return addr 0x0000000004700e78 (lager_console_backend:handle_event/2 + 752) | |
y(0) [] | |
y(1) [] | |
y(2) put_chars | |
y(3) {put_chars,unicode,[[["0",49],58,"31",58,"17",46,"796"]," ",["[","info","] "],"Starting handoff of partition riak_kv_vnode 1402984481929958563080295107968853719064239931392 from '[email protected]' to '[email protected]'","\n"]} | |
y(4) <0.39.0> | |
0xfffffd7ffad53da0 Return addr 0x00000000030bab08 (gen_event:server_update/4 + 272) | |
y(0) {state,6,false} | |
0xfffffd7ffad53db0 Return addr 0x00000000030ba6f0 (gen_event:server_notify/4 + 136) | |
y(0) lager_event | |
y(1) {log,6,{["2012",45,["0",49],45,"19"],[["0",49],58,"31",58,"17",46,"796"]},[["[","info","] "],["<0.11161.124>",64,"riak_core_handoff_sender",58,"start_fold",58,"39",32],"Starting handoff of partition riak_kv_vnode 1402984481929958563080295107968853719064239931392 from '[email protected]' to '[email protected]'"]} | |
y(2) {handler,lager_console_backend,false,{state,6,false},<0.46.0>} | |
y(3) {state,6,false} | |
y(4) lager_console_backend | |
y(5) Catch 0x00000000030bab08 (gen_event:server_update/4 + 272) | |
0xfffffd7ffad53de8 Return addr 0x00000000030ba7d0 (gen_event:server_notify/4 + 360) | |
y(0) lager_event | |
y(1) handle_event | |
y(2) {log,6,{["2012",45,["0",49],45,"19"],[["0",49],58,"31",58,"17",46,"796"]},[["[","info","] "],["<0.11161.124>",64,"riak_core_handoff_sender",58,"start_fold",58,"39",32],"Starting handoff of partition riak_kv_vnode 1402984481929958563080295107968853719064239931392 from '[email protected]' to '[email protected]'"]} | |
y(3) [] | |
0xfffffd7ffad53e10 Return addr 0x00000000030b7750 (gen_event:handle_msg/5 + 616) | |
y(0) {handler,lager_file_backend,false,{state,[{file,"/var/log/riak/error.log",3,{file_descriptor,prim_file,{#Port<0.1755082>,11}},1727659,false,10485760,[{hour,0}],5},{file,"/var/log/riak/console.log",6,{file_descriptor,prim_file,{#Port<0.1755079>,12}},1727658,false,10485760,[{hour,0}],5}]},<0.47.0>} | |
0xfffffd7ffad53e20 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56) | |
y(0) [] | |
y(1) lager_event | |
y(2) <0.42.0> | |
y(3) {<0.11161.124>,#Ref<0.0.22.3919>} | |
y(4) <0.11161.124> | |
0xfffffd7ffad53e50 Return addr 0x00000000006df368 (<terminate process normally>) | |
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88) | |
ok | |
([email protected])11> | |
([email protected])11> | |
([email protected])11> | |
([email protected])11> process_info(pid(0,39,0)). | |
[{current_function,{application_master,main_loop,2}}, | |
{initial_call,{proc_lib,init_p,5}}, | |
{status,waiting}, | |
{message_queue_len,0}, | |
{messages,[]}, | |
{links,[<0.6.0>,<0.40.0>]}, | |
{dictionary,[{'$ancestors',[<0.38.0>]}, | |
{'$initial_call',{application_master,init,4}}]}, | |
{trap_exit,true}, | |
{error_handler,error_handler}, | |
{priority,normal}, | |
{group_leader,<0.39.0>}, | |
{total_heap_size,233}, | |
{heap_size,233}, | |
{stack_size,6}, | |
{reductions,4369}, | |
{garbage_collection,[{min_bin_vheap_size,46368}, | |
{min_heap_size,233}, | |
{fullsweep_after,0}, | |
{minor_gcs,0}]}, | |
{suspending,[]}] | |
([email protected])12> io:format("~s\n", [element(2,process_info(pid(0,39,0), backtrace))]). | |
Program counter: 0x00000000012f7130 (application_master:main_loop/2 + 64) | |
CP: 0x0000000000000000 (invalid) | |
arity = 0 | |
0xfffffd7ff3a862b0 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56) | |
y(0) [] | |
y(1) {state,<0.40.0>,{appl_data,lager,[],undefined,{lager_app,[]},[error_logger_lager_h,lager,lager_app,lager_console_backend,lager_crash_log,lager_file_backend,lager_handler_watcher,lager_handler_watcher_sup,lager_mochiglobal,lager_stdlib,lager_sup,lager_transform,lager_trunc_io,lager_util],[],infinity,infinity},[],0,<0.32.0>} | |
y(2) <0.6.0> | |
0xfffffd7ff3a862d0 Return addr 0x00000000006df368 (<terminate process normally>) | |
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88) | |
ok | |
([email protected])13> x. | |
x | |
([email protected])14> | |
([email protected])14> | |
([email protected])14> | |
([email protected])14> | |
([email protected])14> | |
([email protected])14> | |
([email protected])14> process_info(pid(0,89,0)). | |
[{registered_name,riak_sysmon_handler}, | |
{current_function,{gen,do_call,4}}, | |
{initial_call,{proc_lib,init_p,5}}, | |
{status,waiting}, | |
{message_queue_len,305}, | |
{messages,[{notify,{monitor,<0.13317.112>,long_gc, | |
[{timeout,580}, | |
{old_heap_block_size,0}, | |
{heap_block_size,75025}, | |
{mbuf_size,0}, | |
{stack_size,33}, | |
{old_heap_size,0}, | |
{heap_size,25272}]}}, | |
{notify,{monitor,<0.26837.116>,long_gc, | |
[{timeout,119}, | |
{old_heap_block_size,0}, | |
{heap_block_size,46368}, | |
{mbuf_size,0}, | |
{stack_size,78}, | |
{old_heap_size,0}, | |
{heap_size,14577}]}}, | |
{notify,{monitor,<0.14736.112>,long_gc, | |
[{timeout,121}, | |
{old_heap_block_size,0}, | |
{heap_block_size,46368}, | |
{mbuf_size,0}, | |
{stack_size,78}, | |
{old_heap_size,0}, | |
{heap_size,14575}]}}, | |
{notify,{monitor,<0.7202.137>,long_gc, | |
[{timeout,266}, | |
{old_heap_block_size,0}, | |
{heap_block_size,4181}, | |
{mbuf_size,0}, | |
{stack_size,29}, | |
{old_heap_size,0}, | |
{heap_size,1448}]}}, | |
{notify,{monitor,<0.32645.111>,long_gc, | |
[{timeout,533}, | |
{old_heap_block_size,0}, | |
{heap_block_size,46368}, | |
{mbuf_size,0}, | |
{stack_size,30}, | |
{old_heap_size,0}, | |
{heap_size,14501}]}}, | |
{notify,{monitor,<0.14405.111>,long_gc, | |
[{timeout,595}, | |
{old_heap_block_size,0}, | |
{heap_block_size,46368}, | |
{mbuf_size,0}, | |
{stack_size,82}, | |
{old_heap_size,0}, | |
{heap_size,14489}]}}, | |
{notify,{monitor,<0.107.0>,long_gc, | |
[{timeout,728}, | |
{old_heap_block_size,0}, | |
{heap_block_size,75025}, | |
{mbuf_size,0}, | |
{stack_size,9}, | |
{old_heap_size,0}, | |
{heap_size,...}]}}, | |
{notify,{monitor,<0.14353.111>,long_gc, | |
[{timeout,603}, | |
{old_heap_block_size,0}, | |
{heap_block_size,46368}, | |
{mbuf_size,0}, | |
{stack_size,30}, | |
{old_heap_size,...}, | |
{...}]}}, | |
{notify,{monitor,<0.14164.139>,long_gc, | |
[{timeout,466}, | |
{old_heap_block_size,0}, | |
{heap_block_size,46368}, | |
{mbuf_size,0}, | |
{stack_size,...}, | |
{...}|...]}}, | |
{notify,{monitor,<0.27900.116>,long_gc, | |
[{timeout,599}, | |
{old_heap_block_size,0}, | |
{heap_block_size,46368}, | |
{mbuf_size,...}, | |
{...}|...]}}, | |
{notify,{monitor,<0.13157.111>,long_gc, | |
[{timeout,599}, | |
{old_heap_block_size,0}, | |
{heap_block_size,...}, | |
{...}|...]}}, | |
{notify,{monitor,<0.25009.111>,long_gc, | |
[{timeout,599},{old_heap_block_size,...},{...}|...]}}, | |
{notify,{monitor,<0.14045.112>,long_gc, | |
[{timeout,...},{...}|...]}}, | |
{notify,{monitor,<0.25009.111>,long_gc,[{...}|...]}}, | |
{notify,{monitor,<0.5408.112>,long_gc,[...]}}, | |
{notify,{monitor,<0.32645.111>,long_gc,...}}, | |
{notify,{monitor,<0.25144.111>,...}}, | |
{notify,{monitor,...}}, | |
{notify,{...}}, | |
{notify,...}, | |
{...}|...]}, | |
{links,[<0.87.0>,<0.94.0>]}, | |
{dictionary,[{'$ancestors',[riak_sysmon_sup,<0.86.0>]}, | |
{'$initial_call',{gen_event,init_it,6}}]}, | |
{trap_exit,true}, | |
{error_handler,error_handler}, | |
{priority,normal}, | |
{group_leader,<0.85.0>}, | |
{total_heap_size,17711}, | |
{heap_size,17711}, | |
{stack_size,34}, | |
{reductions,112210}, | |
{garbage_collection,[{min_bin_vheap_size,46368}, | |
{min_heap_size,233}, | |
{fullsweep_after,0}, | |
{minor_gcs,0}]}, | |
{suspending,[]}] | |
([email protected])15> io:format("~s\n", [element(2,process_info(pid(0,89,0), backtrace))]). | |
Program counter: 0x00000000030aa5a0 (gen:do_call/4 + 576) | |
CP: 0x0000000000000000 (invalid) | |
arity = 0 | |
0xfffffd7ff86e2b40 Return addr 0x00000000030b6a18 (gen_event:rpc/2 + 96) | |
y(0) #Ref<0.0.24.65121> | |
y(1) '[email protected]' | |
y(2) [] | |
y(3) infinity | |
y(4) {sync_notify,{log,6,{["2012",45,["0",49],45,"19"],[["0",49],58,"41",58,"18",46,"933"]},[["[","info","] "],["<0.89.0>",64,"riak_core_sysmon_handler",58,"handle_event",58,"85",32],"monitor long_gc <0.568.0> [{name,riak_kv_put_fsm_sup},{initial_call,{supervisor,riak_kv_put_fsm_sup,1}},{almost_current_function,{proc_lib,sync_wait,2}}] [{timeout,578},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,28},{old_heap_size,0},{heap_size,395}]"]}} | |
y(5) <0.89.0> | |
y(6) <0.43.0> | |
0xfffffd7ff86e2b80 Return addr 0x00000000048114f8 (riak_core_sysmon_handler:handle_event/2 + 648) | |
0xfffffd7ff86e2b88 Return addr 0x00000000030bab08 (gen_event:server_update/4 + 272) | |
y(0) [] | |
y(1) [] | |
y(2) [] | |
y(3) [] | |
y(4) [] | |
y(5) {state} | |
0xfffffd7ff86e2bc0 Return addr 0x00000000030ba6f0 (gen_event:server_notify/4 + 136) | |
y(0) riak_sysmon_handler | |
y(1) {monitor,<0.568.0>,long_gc,[{timeout,578},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,28},{old_heap_size,0},{heap_size,395}]} | |
y(2) {handler,riak_core_sysmon_handler,false,{state},<0.94.0>} | |
y(3) {state} | |
y(4) riak_core_sysmon_handler | |
y(5) Catch 0x00000000030bab08 (gen_event:server_update/4 + 272) | |
0xfffffd7ff86e2bf8 Return addr 0x00000000030b75e8 (gen_event:handle_msg/5 + 256) | |
y(0) riak_sysmon_handler | |
y(1) handle_event | |
y(2) {monitor,<0.568.0>,long_gc,[{timeout,578},{old_heap_block_size,0},{heap_block_size,987},{mbuf_size,0},{stack_size,28},{old_heap_size,0},{heap_size,395}]} | |
y(3) [] | |
0xfffffd7ff86e2c20 Return addr 0x00000000030afec8 (proc_lib:init_p_do_apply/3 + 56) | |
y(0) [] | |
y(1) riak_sysmon_handler | |
y(2) <0.87.0> | |
0xfffffd7ff86e2c40 Return addr 0x00000000006df368 (<terminate process normally>) | |
y(0) Catch 0x00000000030afee8 (proc_lib:init_p_do_apply/3 + 88) | |
ok | |
([email protected])16> | |
** See my yellow notepad for more notes, but... | |
... erlang.log.4 shows that Greg (?) used the console successfully | |
prior to the console going dead. | |
([email protected])5> | |
rp(lists:keysort(2, [{N, element(1, rpc:call(N, erlang, statistics, [wall_clock]))} || N <- [node() | nodes()]])). | |
[{'[email protected]',6519722}, | |
{'[email protected]',7155611}, | |
{'[email protected]',7183842}, | |
{'[email protected]',7187230}, | |
{'[email protected]',7195017}, | |
{'[email protected]',7232344}, | |
{'[email protected]',7246924}, | |
{'[email protected]',7256024}, | |
{'[email protected]',7279753}, | |
{'[email protected]',7297350}, | |
{'[email protected]',7299218}, | |
{'[email protected]',7369359}, | |
{'[email protected]',7378016}, | |
{'[email protected]',7402411}, | |
{'[email protected]',7436498}, | |
{'[email protected]',7455697}, | |
{'[email protected]',7493179}, | |
{'[email protected]',7498511}, | |
{'[email protected]',7511964}, | |
{'[email protected]',7558773}, | |
{'[email protected]',7576800}, | |
{'[email protected]',7606920}, | |
{'[email protected]',7623072}, | |
{'[email protected]',7646077}, | |
{'[email protected]',7656642}, | |
{'[email protected]',7678676}, | |
{'[email protected]',7683218}, | |
{'[email protected]',7715771}, | |
{'[email protected]',7767783}, | |
{'[email protected]',7788377}, | |
{'[email protected]',7806678}, | |
{'[email protected]',7833097}, | |
{'[email protected]',7872286}, | |
{'[email protected]',7888006}, | |
{'[email protected]',7923088}, | |
{'[email protected]',7945616}, | |
{'[email protected]',7962402}, | |
{'[email protected]',7993250}, | |
{'[email protected]',8088754}, | |
{'[email protected]',8127035}, | |
{'[email protected]',8200300}] | |
ok | |
([email protected])6> | |
===== ALIVE Thu Jan 19 01:47:12 UTC 2012 | |
===== ALIVE Thu Jan 19 02:02:12 UTC 2012 | |
===== ALIVE Thu Jan 19 02:17:12 UTC 2012 | |
===== ALIVE Thu Jan 19 02:32:12 UTC 2012 | |
[SLF: many more 1/2 hour interval messages omitted...] | |
===== ALIVE Thu Jan 19 06:47:13 UTC 2012 | |
===== ALIVE Thu Jan 19 07:02:13 UTC 2012 | |
^R | |
x | |
y | |
asdflkjasdflkjasdf | |
^R | |
^G^M | |
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded^M | |
(v)ersion (k)ill (D)b-tables (d)istribution^M | |
c | |
Eh?^M | |
^M | |
c | |
salkasdflkj | |
asdflkjasdf | |
asdflkjqwer | |
oiquwer | |
qwoeriua | |
asdflkj | |
qweoriu | |
-- end of log file -- | |
Also, I didn't think anything much about it at the time (other than | |
"Hm, that seems a little bit high"), beam.smp's CPU utilization was | |
consistently high at that time: | |
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP | |
11504 riak 9046M 8893M run 59 0 25:00:10 32% beam.smp/23 | |
... and ... | |
Total: 56 processes, 287 lwps, load averages: 5.11, 4.91, 4.97 | |
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID | |
11504 riak 48 2.9 0.0 0.0 0.0 34 15 0.1 19K 2 24K 0 beam.smp/20 | |
11504 riak 46 2.2 0.0 0.0 0.0 44 7.5 0.1 14K 15 17K 0 beam.smp/14 | |
11504 riak 40 2.3 0.0 0.0 0.0 51 6.1 0.1 16K 7 19K 0 beam.smp/19 | |
11504 riak 38 3.1 0.0 0.0 0.0 50 8.4 0.2 24K 5 28K 0 beam.smp/8 | |
11504 riak 38 1.9 0.0 0.0 0.0 52 7.6 0.2 14K 7 17K 0 beam.smp/13 | |
11504 riak 35 1.6 0.0 0.0 0.0 60 2.8 0.1 9K 1 11K 0 beam.smp/16 | |
11504 riak 33 2.7 0.0 0.0 0.0 51 13 0.1 18K 6 22K 0 beam.smp/7 | |
11504 riak 30 1.3 0.0 0.0 0.0 59 10 0.1 9K 4 12K 0 beam.smp/18 | |
11504 riak 27 1.2 0.0 0.0 0.0 69 2.6 0.0 7K 3 8K 0 beam.smp/15 | |
11504 riak 23 1.6 0.0 0.0 0.0 71 4.0 0.2 10K 6 11K 0 beam.smp/6 | |
11504 riak 19 1.2 0.0 0.0 0.0 75 3.1 1.0 7K 3 9K 0 beam.smp/11 | |
11504 riak 15 0.6 0.0 0.0 0.0 81 3.3 0.0 4K 1 4K 0 beam.smp/12 | |
11504 riak 10 0.5 0.0 0.0 0.0 86 2.6 0.0 3K 3 4K 0 beam.smp/9 | |
11504 riak 4.7 0.2 0.0 0.0 0.0 95 0.3 0.0 1K 2 1K 0 beam.smp/17 | |
11504 riak 2.2 0.2 0.0 0.0 0.0 97 0.5 0.0 1K 2 1K 0 beam.smp/21 | |
I did not use "truss" or DTrace to try to see what it was doing at | |
that time, alas. | |
So, when I restarted riak on prod-2000 via kill {pid} ("riak stop" | |
didn't work), a lot of handoffs were triggered. After a few minutes, | |
all outstanding transfers were finished and none remained | |
cluster-wide. | |
So, prod-2000 was in an odd state were "erl -remsh prod-2000..." would | |
work, but "riak attach" did not, and apparently its vnode weren't | |
working correctly? Odd............ | |
---- | |
Find all current transfers: | |
rp(lists:flatten(element(1,(rpc:multicall(lists:sort([node() | nodes()]), ets, tab2list, [global_locks]))))). | |
---- | |
Pids = [pid(0,31950,667), pid(0,31204,667), pid(0,3869,668)]. | |
[begin | |
{_, M} = process_info(P, messages), | |
lists:sort([element(2, element(4, element(2, GE))) || GE <- M, element(1, element(4, element(2, GE))) == riak_kv_get_req_v1]) | |
end || P <- Pids]. | |
[[{<<"bodies">>,<<"1323062665794_1623433739">>}, | |
{<<"bodies">>,<<"1323062665794_1623433739">>}, | |
{<<"bodies">>,<<"1323291523887_1261585781">>}, | |
{<<"bodies">>,<<"1323291523887_1261585781">>}, | |
{<<"bodies">>,<<"1323749748395_1663680247">>}, | |
{<<"bodies">>,<<"1323749748395_1663680247">>}, | |
{<<"bodies">>, | |
<<"1325103665235_a6d971e73bc93292_11ef8a9f712ba82f">>}, | |
{<<"bodies">>, | |
<<"1325103665235_a6d971e73bc93292_11ef8a9f712ba82f">>}, | |
{<<"bodies">>,<<"1325181319905_0223540070">>}, | |
{<<"bodies">>,<<"1325292430028_726416307">>}, | |
{<<"bodies">>,<<"1325292430028_726416307">>}, | |
{<<"bodies">>, | |
<<"1325572004610_ad09ab45a4e270ab_305d714fc0460266">>}, | |
{<<"bodies">>, | |
<<"1325572004610_ad09ab45a4e270ab_305d714fc0460266">>}, | |
{<<"bodies">>,<<"1325626956996_1424663528">>}, | |
{<<"bodies">>,<<"1325626956996_1424663528">>}, | |
{<<"bodies">>,<<"1325695381884_1709701352">>}, | |
{<<"bodies">>,<<"1325695381884_1709701352">>}, | |
{<<"bodies">>,<<"1326132704350_a4fae6cc359908b1_a"...>>}, | |
{<<"bodies">>,<<"1326231759930_144d4bfd752838"...>>}, | |
{<<"bodies">>,<<"1326494084453_5114271274">>}, | |
{<<"bodies">>,<<"1326612744480_193742"...>>}, | |
{<<"bodies">>,<<"1326612744480_19"...>>}, | |
{<<"bodies">>,<<"132661274448"...>>}, | |
{<<"bodies">>,<<"13266127"...>>}, | |
{<<"bodies">>,<<"1326"...>>}, | |
{<<"bodi"...>>,<<...>>}, | |
{<<...>>,...}, | |
{...}|...], | |
[{<<"bodies">>,<<"1323897305170_6801318432">>}, | |
{<<"bodies">>,<<"1323897305170_6801318432">>}, | |
{<<"bodies">>,<<"1327040954254_0443391082_3a2190a">>}, | |
{<<"bodies">>,<<"1327277886646_7864500163_edc0f306">>}, | |
{<<"bodies">>,<<"1327295682946_4578250107_454b0b52">>}, | |
{<<"bodies">>,<<"1327311363065_8737892066_5ae266b1">>}, | |
{<<"bodies">>,<<"1327332554454_8735578690">>}, | |
{<<"bodies">>,<<"1327355380079_4177460385_5c04edbb">>}, | |
{<<"bodies">>,<<"1327388143845_0554939193_552f81b">>}, | |
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>}, | |
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>}, | |
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>}, | |
{<<"bodies">>,<<"1327432439439_5035365936_10c70a78">>}, | |
{<<"bodies">>, | |
<<"1327437260171_96c2d749a6e2d109_e2d87f6b66007"...>>}, | |
{<<"bodies">>, | |
<<"1327437260171_96c2d749a6e2d109_e2d87f6b6"...>>}, | |
{<<"bodies">>,<<"1327437260171_96c2d749a6e2d109_e2d87"...>>}, | |
{<<"bodies">>,<<"1327437260171_96c2d749a6e2d109_e"...>>}, | |
{<<"bodies">>,<<"1327437260171_96c2d749a6e2d1"...>>}, | |
{<<"bodies">>,<<"1327437260171_96c2d749a6"...>>}, | |
{<<"bodies">>,<<"1327437260171_96c2d7"...>>}, | |
{<<"bodies">>,<<"1327437260171_96"...>>}, | |
{<<"bodies">>,<<"132745260716"...>>}, | |
{<<"bodies">>,<<"13274541"...>>}, | |
{<<"bodies">>,<<"1327"...>>}, | |
{<<"bodi"...>>,<<...>>}, | |
{<<...>>,...}, | |
{...}|...], | |
[]] | |
([email protected])17> [length(LL) || LL <- v(16)]. | |
[28050,7425,0] | |
([email protected])19> file:write_file("/tmp/odd-vnode-backlog.keys.bin", term_to_binary(v(61))). | |
[begin | |
{_, M} = process_info(P, messages), | |
length(lists:sort([element(2, element(4, element(2, GE))) || GE <- M, element(1, element(4, element(2, GE))) == riak_kv_get_req_v1])) | |
end || P <- Pids]. | |
---- | |
dtrace -n 'profile-997 /execname == "beam.smp" && arg1/ { @[uaddr(arg1)] = count(); } tick-5s { trunc(@, 10); exit(0); }' | |
dtrace -q -n 'pid$target:eleveldb.so::entry { self->level = 1; } pid$target:eleveldb.so::return { self->level = 0 } syscall::open:entry /self->level > 0/ { printf("%s\n", copyinstr(arg0)); }' -p `pgrep -u riak beam.smp` | |
dtrace -q -n 'pid$target:eleveldb.so:eleveldb_*:entry { self->level = 1; } pid$target:eleveldb.so:eleveldb_*:return { self->level = 0 } syscall::open:entry /self->level > 0/ { printf("%s\n", copyinstr(arg0)); } tick-5sec { exit(0);}' -p `pgrep -u riak beam.smp` | awk -F/ '{print $6}' | sort | uniq -c | sort -nr | head -20 | |
dtrace -n 'pid$target:eleveldb.so:*DoCompactionWork*:entry' -p `pgrep -u riak beam.smp` | |
dtrace -n 'pid$target:eleveldb.so:*OpenCompactionOutputFile*:entry' -p `pgrep -u riak beam.smp` | |
dtrace -q -n 'pid$target:eleveldb.so:*OpenCompactionOutputFile*:entry { self->level = 1; } pid$target:eleveldb.so:*OpenCompactionOutputFile*:return { self->level = 0 } syscall::open:entry /self->level > 0/ { printf("%s\n", copyinstr(arg0)); } tick-5sec { exit(0);}' -p `pgrep -u riak beam.smp` | |
/var/db/riak/leveldb/874902835628753407088729299272540518602623680512/352337.sst | |
/var/db/riak/leveldb/874902835628753407088729299272540518602623680512/352338.sst | |
/var/db/riak/leveldb/874902835628753407088729299272540518602623680512/352339.sst | |
[...] | |
dtrace -n 'syscall::*read*:entry { @[execname, lwp] = count} tick-5sec { exit(0);}' | |
---- | |
repair action on prod-2003, 2014, 2015, and 2019 | |
repair action on prod-2006 | |
For example, on 2014: | |
[root@prod-2014 /var/db/riak/leveldb]# ls -lt | head | |
total 52123 | |
drwxr-xr-x 2 riak riak 62692 2012-01-27 02:00 984800907967112317930217318920151644104095170560 | |
drwxr-xr-x 2 riak riak 68096 2012-01-27 02:00 725041827894627619577609272480343529282435284992 | |
drwxr-xr-x 2 riak riak 18 2012-01-27 01:28 943410724878639481379527025806116285148995518464 | |
drwxr-xr-x 2 riak riak 17 2012-01-27 01:28 646543204795799826119403544160621296781384220672 | |
[...] | |
So, two hosed dirs. | |
Verify by using "grep error: $vnode_dir/LOG", | |
there should be at least one match, probably thousands. | |
cd /var/db/riak/leveldb | |
erl -pz /tmp/eleveldb+repair/ebin | |
At shell prompt, e.g., | |
eleveldb:repair("./984800907967112317930217318920151644104095170560", []). | |
Then, there's a second step, which takes a long long long time also. | |
1> {ok, DB} = eleveldb:open("./1455792646560079078679451688838485039110401556480", [write]). | |
{ok,<<>>} | |
Watch "iostat -x 1" and the 1455*/LOG file for compaction activity. | |
If compaction activity doesn't happen, or stops after only a few | |
minutes, then use the following (perhaps alternating put & delete | |
several times) to trigger enough writes to | |
10> [eleveldb:put(DB, <<X:32>>, <<"asdlfkjasdlfkjasdlkfjasdfasdlfkjasdlasdlfkjasdlasdlfkjasdlasdlfkjasdlasdlfkjasdl">>, []) || X <- lists:seq(1,20000)]. | |
[ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok, | |
ok,ok,ok,ok,ok,ok,ok,ok,ok,ok|...] | |
11> [eleveldb:delete(DB, <<X:32>>, []) || X <- lists:seq(1,20000)]. [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok, | |
ok,ok,ok,ok,ok,ok,ok,ok,ok,ok|...] | |
When compaction has finally finished (no "iostat -x" activity, and no | |
activity in the db's LOG file), then: | |
12> q(). | |
ok | |
---- | |
dtrace -qn 'pid$target:eleveldb.so:*DoCompactionWork*:entry { self->start = timestamp; } pid$target:eleveldb.so:*DoCompactionWork*:return /self->start > 0/ { @ = quantize((timestamp - self->start) / 1000000); self->start = 0; } END {printf("DoCompactionWork in milliseconds\n"); printa(@); }' -p `pgrep -u riak beam.smp` | |
dtrace -qn 'pid$target:eleveldb.so:eleveldb_*:entry { self->start = timestamp; } pid$target:eleveldb.so:eleveldb_*:return /self->start > 0/ { @[probefunc] = quantize((timestamp - self->start) / 1000000); self->start = 0; } END {printf("NIF function calls in milliseconds\n"); printa(@); } tick-300sec { exit(0); }' -p `pgrep -u riak beam.smp` ; date | |
--- | |
Sunday 2012-01-29 | |
Interesting, a new way for riak_sysmon_filter to get hosed. | |
([email protected])10> process_info(pid(0,91,0)). | |
[{registered_name,riak_sysmon_filter}, | |
{current_function,{net_kernel,get_status,3}}, | |
{initial_call,{proc_lib,init_p,5}}, | |
{status,waiting}, | |
{message_queue_len,33004}, | |
{messages,[reset,reset,reset,reset,reset,reset,reset,reset, | |
reset,reset,reset,reset,reset,reset,reset,reset,reset,reset, | |
reset,reset,reset|...]}, | |
{links,[<0.79.0>,<0.90.0>]}, | |
{dictionary,[{'$ancestors',[riak_sysmon_sup,<0.89.0>]}, | |
{'$initial_call',{riak_sysmon_filter,init,1}}]}, | |
{trap_exit,false}, | |
{error_handler,error_handler}, | |
{priority,normal}, | |
{group_leader,<0.88.0>}, | |
{total_heap_size,2584}, | |
{heap_size,2584}, | |
{stack_size,27}, | |
{reductions,118673663}, | |
{garbage_collection,[{min_bin_vheap_size,46368}, | |
{min_heap_size,233}, | |
{fullsweep_after,0}, | |
{minor_gcs,0}]}, | |
{suspending,[]}] | |
([email protected])11> io:format("~s\n", [element(2,process_info(pid(0,91,0), backtrace))]). | |
Program counter: 0x0000000004435cf0 (net_kernel:get_status/3 + 216) | |
CP: 0x0000000000000000 (invalid) | |
arity = 0 | |
0x0000000000e0dc48 Return addr 0x0000000004435a38 (net_kernel:get_node_info/1 + 424) | |
y(0) '[email protected]' | |
y(1) <0.10111.2> | |
0x0000000000e0dc60 Return addr 0x0000000004436180 (net_kernel:get_nodes_info/2 + 96) | |
y(0) normal | |
y(1) {net_address,{{192,168,25,113},60765},"prod-2038-internal.XXXer.com",tcp,inet} | |
y(2) <0.10111.2> | |
y(3) up | |
0x0000000000e0dc88 Return addr 0x000000000487bed0 (riak_sysmon_filter:get_node_map/0 + 64) | |
y(0) [{'[email protected]',[{owner,<0.195.0>},{state,up},{address,{net_address,{{192,168,25,237},50572},"prod-2098-internal.XXXer.com",tcp,inet}},{type,normal},{in,16612},{out,24082}]},{'[email protected]',[{owner,<0.10202.2>},{state,up},{address,{net_address,{{192,168,25,99},40263},"prod-2012-internal.XXXer.com",tcp,inet}},{type,normal},{in,14},{out,5}]},{'[email protected]',[{owner,<0.10287.2>},{state,pending},{address,{net_address,{{192,168,25,20},62879},"prod-2032-internal.XXXer.com",tcp,inet}},{type,normal},{in,0},{out,0}]},{'[email protected]',[{owner,<0.211.0>},{state,up},{address,{net_address,{{192,168,25,18},50827},"prod-2002-internal.XXXer.com",tcp,inet}},{type,normal},{in,25453},{out,24814}]},{'[email protected]',[{owner,<0.10191.2>},{state,up},{address,{net_address,{{192,168,25,68},37453},"prod-2035-internal.XXXer.com",tcp,inet}},{type,normal},{in,10},{out,5}]},{'[email protected]',[{owner,<0.10185.2>},{state,up},{address,{net_address,{{192,168,25,102},57790},"prod-2015-internal.XXXer.com",tcp,inet}},{type,normal},{in,11},{out,4}]},{'[email protected]',[{owner,<0.10189.2>},{state,up},{address,{net_address,{{192,168,25,59},45894},"prod-2034-internal.XXXer.com",tcp,inet}},{type,normal},{in,11},{out,6}]},{'[email protected]',[{owner,<0.9036.2>},{state,up},{address,{net_address,{{192,168,26,7},35922},"prod-2104-internal.XXXer.com",tcp,inet}},{type,normal},{in,64},{out,70}]},{'[email protected]',[{owner,<0.10181.2>},{state,up},{address,{net_address,{{192,168,25,104},34530},"prod-2017-internal.XXXer.com",tcp,inet}},{type,normal},{in,5},{out,4}]},{'[email protected]',[{owner,<0.10094.2>},{state,up},{address,{net_address,{{192,168,25,112},51065},"prod-2037-internal.XXXer.com",tcp,inet}},{type,normal},{in,15},{out,8}]}] | |
y(1) ['[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]','[email protected]'] | |
y(2) '[email protected]' | |
0x0000000000e0dca8 Return addr 0x000000000487b560 (riak_sysmon_filter:handle_info/2 + 2600) | |
y(0) Catch 0x000000000487bf48 (riak_sysmon_filter:get_node_map/0 + 184) | |
0x0000000000e0dcb8 Return addr 0x00000000043c5cc8 (gen_server:handle_msg/5 + 1680) | |
y(0) [] | |
y(1) {0,nil} | |
y(2) {state,0,30,0,2,{0,nil},[{{{192,168,25,98},46335},'[email protected]'},{{{192,168,24,169},48853},'[email protected]'},{{{192,168,25,4},45159},'[email protected]'},{{{192,168,24,123},50343},'[email protected]'},{{{192,168,24,168},47264},'[email protected]'},{{{192,168,24,121},59550},'[email protected]'},{{{192,168,25,248},39400},'[email protected]'},{{{192,168,24,233},57098},'[email protected]'},{{{192,168,25,115},34939},'[email protected]'},{{{192,168,25,97},51083},'[email protected]'},{{{192,168,25,41},38256},'[email protected]'},{{{192,168,25,250},64683},'[email protected]'},{{{192,168,25,100},33050},'[email protected]'},{{{192,168,24,125},55403},'[email protected]'},{{{192,168,25,103},38466},'[email protected]'},{{{192,168,24,127},41757},'[email protected]'},{{{192,168,25,90},45461},'[email protected]'},{{{192,168,26,11},34804},'[email protected]'},{{{192,168,25,87},41709},'[email protected]'},{{{192,168,25,96},43607},'[email protected]'},{{{192,168,25,240},45068},'[email protected]'},{{{192,168,25,106},61231},'[email protected]'},{{{192,168,25,114},58133},'[email protected]'},{{{192,168,24,166},63433},'[email protected]'},{{{192,168,25,105},62116},'[email protected]'},{{{192,168,25,111},48904},'[email protected]'},{{{192,168,26,18},44053},'[email protected]'},{{{192,168,25,113},60765},'[email protected]'},{{{192,168,25,95},56026},'[email protected]'},{{{192,168,25,237},50572},'[email protected]'},{{{192,168,25,99},40263},'[email protected]'},{{{192,168,25,20},46256},'[email protected]'},{{{192,168,25,249},54807},'[email protected]'},{{{192,168,24,124},64396},'[email protected]'},{{{192,168,25,18},50827},'[email protected]'},{{{192,168,25,68},37453},'[email protected]'},{{{192,168,25,89},65223},'[email protected]'},{{{192,168,25,102},57790},'[email protected]'},{{{192,168,24,199},47972},'[email protected]'},{{{192,168,26,8},55805},'[email protected]'},{{{192,168,25,101},44535},'[email protected]'},{{{192,168,25,59},45894},'[email protected]'},{{{192,168,24,122},36804},'[email protected]'},{{{192,168,25,88},33156},'[email protected]'},{{{192,168,26,7},35922},'[email protected]'},{{{192,168,25,91},56857},'[email protected]'},{{{192,168,25,104},34530},'[email protected]'},{{{192,168,26,12},42432},'[email protected]'},{{{192,168,25,112},51065},'[email protected]'},{{{192,168,24,208},53930},'[email protected]'}],{interval,#Ref<0.0.0.185>},false} | |
0x0000000000e0dcd8 Return addr 0x00000000030bc6e8 (proc_lib:init_p_do_apply/3 + 56) | |
y(0) riak_sysmon_filter | |
y(1) {state,0,30,0,2,{0,nil},[{{{192,168,25,98},46335},'[email protected]'},{{{192,168,24,169},48853},'[email protected]'},{{{192,168,25,4},45159},'[email protected]'},{{{192,168,24,123},50343},'[email protected]'},{{{192,168,24,168},47264},'[email protected]'},{{{192,168,24,121},59550},'[email protected]'},{{{192,168,25,248},39400},'[email protected]'},{{{192,168,24,233},57098},'[email protected]'},{{{192,168,25,115},34939},'[email protected]'},{{{192,168,25,97},51083},'[email protected]'},{{{192,168,25,41},38256},'[email protected]'},{{{192,168,25,250},64683},'[email protected]'},{{{192,168,25,100},33050},'[email protected]'},{{{192,168,24,125},55403},'[email protected]'},{{{192,168,25,103},38466},'[email protected]'},{{{192,168,24,127},41757},'[email protected]'},{{{192,168,25,90},45461},'[email protected]'},{{{192,168,26,11},34804},'[email protected]'},{{{192,168,25,87},41709},'[email protected]'},{{{192,168,25,96},43607},'[email protected]'},{{{192,168,25,240},45068},'[email protected]'},{{{192,168,25,106},61231},'[email protected]'},{{{192,168,25,114},58133},'[email protected]'},{{{192,168,24,166},63433},'[email protected]'},{{{192,168,25,105},62116},'[email protected]'},{{{192,168,25,111},48904},'[email protected]'},{{{192,168,26,18},44053},'[email protected]'},{{{192,168,25,113},60765},'[email protected]'},{{{192,168,25,95},56026},'[email protected]'},{{{192,168,25,237},50572},'[email protected]'},{{{192,168,25,99},40263},'[email protected]'},{{{192,168,25,20},46256},'[email protected]'},{{{192,168,25,249},54807},'[email protected]'},{{{192,168,24,124},64396},'[email protected]'},{{{192,168,25,18},50827},'[email protected]'},{{{192,168,25,68},37453},'[email protected]'},{{{192,168,25,89},65223},'[email protected]'},{{{192,168,25,102},57790},'[email protected]'},{{{192,168,24,199},47972},'[email protected]'},{{{192,168,26,8},55805},'[email protected]'},{{{192,168,25,101},44535},'[email protected]'},{{{192,168,25,59},45894},'[email protected]'},{{{192,168,24,122},36804},'[email protected]'},{{{192,168,25,88},33156},'[email protected]'},{{{192,168,26,7},35922},'[email protected]'},{{{192,168,25,91},56857},'[email protected]'},{{{192,168,25,104},34530},'[email protected]'},{{{192,168,26,12},42432},'[email protected]'},{{{192,168,25,112},51065},'[email protected]'},{{{192,168,24,208},53930},'[email protected]'}],{interval,#Ref<0.0.0.185>},false} | |
y(2) riak_sysmon_filter | |
y(3) <0.90.0> | |
y(4) reset | |
y(5) Catch 0x00000000043c5cc8 (gen_server:handle_msg/5 + 1680) | |
0x0000000000e0dd10 Return addr 0x0000000000672c08 (<terminate process normally>) | |
y(0) Catch 0x00000000030bc708 (proc_lib:init_p_do_apply/3 + 88) | |
ok | |
--- | |
Mon 2012-01-30: more riak_sysmon_filter mailbox suffiness | |
prod-2102: | |
Pid = pid(0,6777,3). | |
process_info(Pid). | |
{_, M} = process_info(Pid, messages). | |
[X || X <- M, X /= reset, element(1, X) /= monitor]. | |
result on 2102: [] | |
result on 2107: [] | |
result on 2108: [] | |
All 3 have stack backtraces that are identical. | |
See also: Jon's notes @ https://raw.github.com/gist/67963a1342931416e552/b5200f3264db0a4c091bdfd49d679cd393d6ec87/gistfile1.txt | |
---- | |
-define(CHSTATE, #chstate_v2). | |
-record(chstate_v2, { | |
nodename :: node(), % the Node responsible for this chstate | |
vclock :: vclock:vclock(), % for this chstate object, entries are | |
% {Node, Ctr} | |
chring :: chash:chash(), % chash ring of {IndexAsInt, Node} mappings | |
meta :: dict(), % dict of cluster-wide other data (primarily | |
% bucket N-value, etc) | |
clustername :: {node(), term()}, | |
next :: [{integer(), node(), node(), [module()], awaiting | complete}], | |
members :: [{node(), {member_status(), vclock:vclock(), []}}], | |
claimant :: node(), | |
seen :: [{node(), vclock:vclock()}], | |
rvsn :: vclock:vclock() | |
}). | |
{ok, R} = riak_core_ring_manager:get_my_ring(). | |
Changes = riak_core_ring:pending_changes(R). | |
Ps = [{Idx, Owner, NewOwner} || {Idx, Owner, NewOwner, Mods, _St} <- Changes]. | |
RunDu = fun(Nd, Cmd) -> | |
Dunno = "-1\t", | |
Out = try case rpc:call(Nd, os, cmd, [Cmd]) of | |
L when is_list(L) -> L; | |
_ -> Dunno | |
end | |
catch _:_ -> Dunno | |
end, | |
case string:tokens(Out, "\t") of | |
[Str|_] -> list_to_integer(Str); | |
_ -> -1 | |
end | |
end. | |
Space = [begin | |
Cmd = "du -k /var/db/riak/leveldb/" ++ integer_to_list(Idx), | |
OwnK = RunDu(Owner, Cmd), | |
NewK = RunDu(NewOwner, Cmd), | |
{Idx, Owner, OwnK, NewOwner, NewK} | |
end || {Idx, Owner, NewOwner} <- Ps]. | |
%% sort by owner node | |
Space2 = lists:sort(fun(X, Y) -> element(2, X) < element(2, Y) end, Space). | |
%% calculate space ratios | |
rp([{Idx, Owner, kbyte_ratio_at_destination_PERCENT, NewK / OwnK * 100}|| {Idx, Owner, OwnK, NewOwner, NewK} <- Space2]). | |
--- | |
dtrace -qn 'BEGIN { c = 0; } pid$target:eleveldb.so:eleveldb_iterator_move:entry {self->i = 1} syscall::read:entry{ c += arg2 ;} syscall::pread:entry{ c += arg2;} tick-10sec { printf("Iterator reads @ 10 sample: %d MBytes/sec\n", c / (1024*1024)/10); c = 0; }' -p `pgrep -u riak beam.smp` | |
---- | |
Alright, back to prod-2003, which has got several LevelDB instances | |
that complain about "bad magic number" or other errors, e.g.: | |
[root@prod-2003 /var/db/riak/leveldb]# grep -i error */LOG | head | |
1043318063368056673053607043667580944695787782144/LOG:2012/01/28-21:47:27.868097 251 Compaction error: Invalid argument: not an sstable (bad magic number) | |
1043318063368056673053607043667580944695787782144/LOG:2012/01/28-21:48:26.053983 251 Compaction error: Invalid argument: not an sstable (bad magic number) | |
1043318063368056673053607043667580944695787782144/LOG:2012/01/28-21:48:39.765415 | |
251 Compaction error: Invalid argument: not an sstable (bad magic number) | |
[root@prod-2003 /var/db/riak/leveldb]# grep -i error */LOG | awk -F: '{ print $ | |
1}' | uniq -c | |
16928 1043318063368056673053607043667580944695787782144/LOG | |
16926 1117534943388766586868637914078954691787690606592/LOG | |
16925 1191751823409476500683668784490328438879593431040/LOG | |
16926 1377294023461251285221245960518762806609350492160/LOG | |
300 1414402463471606242128761395724449680155301904384/LOG | |
16926 523799903223087276348390950787964715052468011008/LOG | |
16926 598016783243797190163421821199338462144370835456/LOG | |
16927 709342103274862060885968126816399082782225072128/LOG | |
16926 820667423305926931608514432433459703420079308800/LOG | |
for i in `cat /tmp/slf.foo`; do echo Moving $i; mv $i ./slf.leveldb-troublemakers ; done | |
Moving 1043318063368056673053607043667580944695787782144 repair+open, screen 0 | |
Moving 1117534943388766586868637914078954691787690606592 repair+open, screen 2 | |
Moving 1191751823409476500683668784490328438879593431040 repair+open, screen 3 | |
Moving 1377294023461251285221245960518762806609350492160 repair+open, screen 4 | |
Moving 1414402463471606242128761395724449680155301904384 repair+open, screen 5 | |
Moving 523799903223087276348390950787964715052468011008 repair+open, screen 6 | |
Moving 598016783243797190163421821199338462144370835456 repair+open, screen 7 | |
Moving 709342103274862060885968126816399082782225072128 repair+open, screen 8 | |
Moving 820667423305926931608514432433459703420079308800 repair, screen 9 | |
... of the remaining ones ... | |
[root@prod-2003 /var/db/riak/leveldb]# du -hs * | grep G | |
39G 1006209623357701716146091608461894071149836369920 seems ok | |
53G 1080426503378411629961122478873267818241739194368 seems ok | |
41G 1154643383399121543776153349284641565333642018816 seems ok | |
53G 1228860263419831457591184219696015312425544843264 seems ok | |
41G 1265968703430186414498699654901702185971496255488 seems ok | |
41G 1303077143440541371406215090107389059517447667712 seems ok | |
41G 1340185583450896328313730525313075933063399079936 seems ok | |
41G 1451510903481961199036276830930136553701253316608 seems ok | |
42G 560908343233442233255906385993651588598419423232 seems ok | |
41G 635125223254152147070937256405025335690322247680 seems ok | |
41G 672233663264507103978452691610712209236273659904 seems ok | |
43G 746450543285217017793483562022085956328176484352 seems ok | |
41G 783558983295571974700998997227772829874127896576 seems ok | |
45G 857775863316281888516029867639146576966030721024 seems ok | |
40G 894884303326636845423545302844833450511982133248 seems ok | |
46G 931992743336991802331060738050520324057933545472 seems ok | |
40G 969101183347346759238576173256207197603884957696 seems ok | |
All seem ok, so, let's see what happens when we try the "Moved" | |
directories.... Damn, every single one is complaining about "Invalid | |
argument: not an sstable (bad magic number)" or about missing .sst | |
files. So, I'm repairing every single !@#$! one of them, and let the | |
!@#$! thing chug through 415GB of data for however long it takes. | |
NOTE: *Very* important to set the ulimit first! | |
ulimit -n 65000 | |
erl -pz /opt/local/riak/lib/eleveldb-1.0.1/ebin | |
f(). | |
{ok, DB0} = eleveldb:open("./", [write]). | |
B = <<1:(1024*1024*8)>>. | |
time(). | |
[eleveldb:put(DB0, <<X:32>>, B, []) || X <- lists:seq(1,2000)]. | |
[eleveldb:delete(DB0, <<X:32>>, []) || X <- lists:seq(1,2000)]. | |
B2 = <<1:(8*1024)>>. | |
[eleveldb:put(DB0, <<X:32>>, B2, []) || X <- lists:seq(1,1*1024*1000)]. | |
[eleveldb:delete(DB0, <<X:32>>, []) || X <- lists:seq(1,1*1024*1000)]. | |
time(). | |
----- | |
2012-02-13 Monday | |
[2/13/12 2:45:17 PM] Scott Fritchie: Where would I find the list of 12 | |
nodes currently leaving? | |
[2/13/12 2:45:36 PM] Justin Shoffstall: | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
[email protected] | |
--- | |
2012-02-18 Saturday morning, 12:30 AM or so CST | |
Owner: [email protected] | |
Next Owner: [email protected] | |
Index: 174124218510127105489110888272838406638695088128 | |
Waiting on: [riak_kv_vnode] | |
Complete: [riak_pipe_vnode] | |
So, let's study these things: | |
* I/O activity on 2106 (the destination | |
* I/O activity on 2027 (the source) | |
Right now on 2106, there appear to be two transfers happening: | |
[root@prod-2106 ~]# env TZ=CST6CDT date ; netstat -na | grep 8099 | grep EST | |
February 18, 2012 12:32:56 AM CST | |
192.168.26.11.8099 192.168.24.208.43407 128872 0 70265 0 ESTABLISHED | |
192.168.26.11.8099 192.168.24.124.47353 128872 0 128872 0 ESTABLISHED | |
So, one of them is coming from prod-2027 (192.168.24.208), hooray | |
for the sanity check. The other is coming from | |
prod-2022 (192.168.24.124). So, for grins and giggles, lets | |
monitor the vnode handoff folding iterator on prod-2022 also. | |
Output from all 3 boxes are in: | |
~/b/notes/2011-12-07.XXXer.data/2012-02-18.transfers-data | |
Only one file has a "date" timestamp in it, alas, but all three | |
sets of output were started within a few seconds of each | |
other. | |
---- | |
Screen experiment on prod-2106: making copies of eleveldb dirs | |
for iteration experiments.... | |
rsync -rav --delete leveldb/944837972571345441260585311775565780285378265088 991937146430642117335508748767399119786008903680 999073384894171916740800178614646595467922636800 slf-basho-leveldb-exp-data/ | |
NOTE: The rsync is now finished, see ~root/slf-basho-leveldb-exp-data/screen.1.out | |
for the output of the rsync commands. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment