Created
March 20, 2013 14:28
-
-
Save szaydel/5205087 to your computer and use it in GitHub Desktop.
List of things to collect when diagnosing issues with systems that appear to suddenly crash at random intervals.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
These two are per-cpu metrics, and we need to split them out. The output will return one cpu per line, so it should be easy to split into individual metrics. | |
These are context switches and we want to track whether there are instances where they skyrocket. | |
kstat -p cpu:*:sys:pswitch | |
kstat -p cpu:*:sys:inv_swtch | |
kstat -p unix:0:system_pages: | |
kstat -p cpu:*:sys:intr | |
kstat -p cpu:*:sys:xcalls | |
kstat -p cpu:*:sys:{cpu_ticks_idle,cpu_ticks_user,cpu_ticks_kernel} | |
kstat -p cpu:*:vm:{pgout,pgin} | |
kstat -p cpu:*:sys:{bread,bwrite} | |
kstat -p cpu:*:sys:intrblk | |
## We should never see any swapping activity. If we are seeing it, something is quite possibly not right. | |
kstat -p cpu:*:vm:{swapin,swapout} | |
Need to get a count of each item that this returns. | |
dtrace -qn 'profile-1001 /arg0/ { @[ func(arg0) ] = count(); } tick-10sec {trunc(@,10); printa(@); clear(@); exit(0); }' | |
Need to log a count of events here. Generally will print a timestamp and count. | |
dtrace -qn 'fbt:zfs:arc_memory_throttle:entry /self->ct = 0/ { self->ct++; } tick-30sec {printf("%d : %d\n",walltimestamp/1000000,self->ct); exit(0); }' | |
Need to log time, this is in milliseconds, not float. | |
dtrace -qn 'fbt:zfs:arc_adjust:entry { self->start = timestamp; } fbt:zfs:arc_adjust:return { self->x = ((timestamp - self->start) / 1000); printf("%d %d\n", walltimestamp, self->x); }' | |
Top ZFS hit functions by count. Need to log each entry and the count. We only return top 10, so may not have same results on each collection. | |
dtrace -qn 'fbt:zfs::entry { @[probefunc] = count(); } tick-1sec {trunc(@,10); printa("%-20s %@d\n",@ ); exit(0)}' | |
Top memory consumers on the system. Again, top 10 are being logged. | |
dtrace -qn 'fbt::kmem_cache_alloc:entry { @[args[0]->cache_name] = sum(args[0]->cache_bufsize); } tick-10sec {trunc(@, 10); printa("%-20s %@d\n", @); exit(0); }' | |
Filesystem flush events, maximum duration of event in microseconds | |
dtrace -qn '::fsflush_do_pages:entry {self->st=vtimestamp } ::fsflush_do_pages:return {self->delta = vtimestamp - self->st; @ = max(self->delta); self->st = 0} tick-10sec {normalize(@, 1000); printf("%d ", walltimestamp); printa("%@d\n", @); exit(0); }' | |
dtrace -qn '::BEGIN { ct=0 } ::socket_vop_write:entry {self->st = vtimestamp; } ::socket_vop_write:return { self->delta = vtimestamp - self->st; @ = max(self->delta/1000); ct++; } tick-10sec /ct > 0/ {printf("%d ", walltimestamp); printa("%@d\n", @); exit(0); }' |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment