Skip to content

Instantly share code, notes, and snippets.

@szaydel
Created March 20, 2013 14:28
Show Gist options
  • Save szaydel/5205087 to your computer and use it in GitHub Desktop.
Save szaydel/5205087 to your computer and use it in GitHub Desktop.
List of things to collect when diagnosing issues with systems that appear to suddenly crash at random intervals.
These two are per-cpu metrics, and we need to split them out. The output will return one cpu per line, so it should be easy to split into individual metrics.
These are context switches and we want to track whether there are instances where they skyrocket.
kstat -p cpu:*:sys:pswitch
kstat -p cpu:*:sys:inv_swtch
kstat -p unix:0:system_pages:
kstat -p cpu:*:sys:intr
kstat -p cpu:*:sys:xcalls
kstat -p cpu:*:sys:{cpu_ticks_idle,cpu_ticks_user,cpu_ticks_kernel}
kstat -p cpu:*:vm:{pgout,pgin}
kstat -p cpu:*:sys:{bread,bwrite}
kstat -p cpu:*:sys:intrblk
## We should never see any swapping activity. If we are seeing it, something is quite possibly not right.
kstat -p cpu:*:vm:{swapin,swapout}
Need to get a count of each item that this returns.
dtrace -qn 'profile-1001 /arg0/ { @[ func(arg0) ] = count(); } tick-10sec {trunc(@,10); printa(@); clear(@); exit(0); }'
Need to log a count of events here. Generally will print a timestamp and count.
dtrace -qn 'fbt:zfs:arc_memory_throttle:entry /self->ct = 0/ { self->ct++; } tick-30sec {printf("%d : %d\n",walltimestamp/1000000,self->ct); exit(0); }'
Need to log time, this is in milliseconds, not float.
dtrace -qn 'fbt:zfs:arc_adjust:entry { self->start = timestamp; } fbt:zfs:arc_adjust:return { self->x = ((timestamp - self->start) / 1000); printf("%d %d\n", walltimestamp, self->x); }'
Top ZFS hit functions by count. Need to log each entry and the count. We only return top 10, so may not have same results on each collection.
dtrace -qn 'fbt:zfs::entry { @[probefunc] = count(); } tick-1sec {trunc(@,10); printa("%-20s %@d\n",@ ); exit(0)}'
Top memory consumers on the system. Again, top 10 are being logged.
dtrace -qn 'fbt::kmem_cache_alloc:entry { @[args[0]->cache_name] = sum(args[0]->cache_bufsize); } tick-10sec {trunc(@, 10); printa("%-20s %@d\n", @); exit(0); }'
Filesystem flush events, maximum duration of event in microseconds
dtrace -qn '::fsflush_do_pages:entry {self->st=vtimestamp } ::fsflush_do_pages:return {self->delta = vtimestamp - self->st; @ = max(self->delta); self->st = 0} tick-10sec {normalize(@, 1000); printf("%d ", walltimestamp); printa("%@d\n", @); exit(0); }'
dtrace -qn '::BEGIN { ct=0 } ::socket_vop_write:entry {self->st = vtimestamp; } ::socket_vop_write:return { self->delta = vtimestamp - self->st; @ = max(self->delta/1000); ct++; } tick-10sec /ct > 0/ {printf("%d ", walltimestamp); printa("%@d\n", @); exit(0); }'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment