Skip to content

Instantly share code, notes, and snippets.

@stevenwilliamson
Last active September 28, 2015 13:56
Show Gist options
  • Save stevenwilliamson/2f00b0344fa5cb68e3a9 to your computer and use it in GitHub Desktop.
Save stevenwilliamson/2f00b0344fa5cb68e3a9 to your computer and use it in GitHub Desktop.
Kernel panic when a disk was pulled.
[root@phy3-sw1-ash (ash) /var/crash/volatile]# mdb -k unix.0 vmcore.0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc apix scsi_vhci ufs ip hook neti sockfs arp usba stmf_sbd stmf zfs sd lofs idm mpt_sas crypto random cpc logindmux ptm kvm sppp nsmb smbsrv nfs ]
> $C
ffffff01e863a710 mutex_owner_running+0xd()
ffffff01e863a750 spa_async_request+0x44(ffffff4447a3e9e0, 2)
ffffff01e863a790 vdev_disk_off_notify+0x77(ffffff43202e0b88, fffffffffbc997c0, ffffff431f509d40, 0)
ffffff01e863a820 ldi_invoke_notify+0x1b1(ffffff431ebe2550, fffffffffffffffe, 0, fffffffffbb5b228, 0)
ffffff01e863a860 e_ddi_offline_notify+0x75(ffffff431ebe2550)
ffffff01e863a8d0 devi_detach_node+0x52(ffffff431ebe2550, 40001)
ffffff01e863a940 ndi_devi_offline+0x89(ffffff431ebe2550, 1)
ffffff01e863a9a0 mptsas_offline_lun+0xd3(ffffff431ebe27f8, ffffff431ebe2550, 0, 1)
ffffff01e863aa00 mptsas_offline_target+0xb8(ffffff431ebe27f8, ffffff5dd58ca440)
ffffff01e863aaf0 mptsas_handle_topo_change+0x3e9(ffffffcd33c88800, ffffff431ebe27f8)
ffffff01e863ab60 mptsas_handle_dr+0x184(ffffffcd33c88800)
ffffff01e863ac20 taskq_thread+0x2d0(ffffff431d3c4d68)
ffffff01e863ac30 thread_start+8()
> ::status
debugging crash dump vmcore.0 (64-bit) from phy3-sw1-ash
operating system: 5.11 joyent_20150622T171240Z (i86pc)
image uuid: (not set)
panic message: BAD TRAP: type=d (#gp General protection) rp=ffffff01e863a5b0 addr=0
dump content: kernel pages only
>
> ::panicinfo
cpu 1
thread ffffff01e863ac40
message BAD TRAP: type=d (#gp General protection) rp=ffffff01e863a5b0 addr=0
rdi ffffff4447a3f0e0
rsi 0
rdx 800
rcx 2
r8 0
r9 a
rax ffffff01e863ac40
rbx ffffff4447a3f0e0
rbp ffffff01e863a710
r10 ffffff01e863a5b0
r11 1fffffe888f47000
r12 1
r13 1fffffe888f47000
r14 0
r15 0
fsbase fffffd7fff071a40
gsbase ffffff42eb902580
ds 4b
es 4b
fs 0
gs 0
trapno d
err 0
rip fffffffffb85f37d
cs 30
rflags 10206
rsp ffffff01e863a6a8
ss 38
gdt_hi 0
gdt_lo 7000ffff
idt_hi 0
idt_lo 6000ffff
ldt 0
task 70
cr0 8005003b
cr2 26179a8
cr3 12000000
cr4 426f8
> fffffffffb85f37d::dis
mutex_owner_running: movq (%rdi),%r11
mutex_owner_running+3: andq $0xfffffffffffffff8,%r11
mutex_owner_running+7: cmpq $0x0,%r11
mutex_owner_running+0xb: je +0x10 <mutex_owner_running+0x1d>
mutex_owner_running+0xd: movq 0xd8(%r11),%r8
mutex_owner_running+0x14: movq 0x18(%r8),%r9
mutex_owner_running+0x18: cmpq %r11,%r9
mutex_owner_running+0x1b: je +0x4 <mutex_owner_running+0x21>
mutex_owner_running+0x1d: xorq %rax,%rax
mutex_owner_running+0x20: ret
mutex_owner_running+0x21: movq %r8,%rax
mutex_owner_running+0x24: ret
0xfffffffffb85f395: nopl (%rax)
mutex_owner_running_critical_size: sbbb %al,(%rax)
mutex_owner_running_critical_size+2: .byte 0
Looking at where it actually paniced
> mutex_owner_running::dis
mutex_owner_running: movq (%rdi),%r11
mutex_owner_running+3: andq $0xfffffffffffffff8,%r11
mutex_owner_running+7: cmpq $0x0,%r11
mutex_owner_running+0xb: je +0x10 <mutex_owner_running+0x1d>
mutex_owner_running+0xd: movq 0xd8(%r11),%r8
mutex_owner_running+0x14: movq 0x18(%r8),%r9
mutex_owner_running+0x18: cmpq %r11,%r9
mutex_owner_running+0x1b: je +0x4 <mutex_owner_running+0x21>
mutex_owner_running+0x1d: xorq %rax,%rax
mutex_owner_running+0x20: ret
mutex_owner_running+0x21: movq %r8,%rax
mutex_owner_running+0x24: ret
How do we get to mutex_owner_running from mutex_enter, and how can it cause a GP fault ?
mutex_owner_running does not have any arguments passed in so what populates 0xd8(%r11).
Am i barking up the wrong tree, do i need to dig this low, is it that the mutex structure is invalid that is passed in as part of the spa_async_request function call in the stack trace ?
ffffff01e863a710 mutex_owner_running+0xd()
ffffff01e863a750 spa_async_request+0x44(ffffff4447a3e9e0, 2)
ffffff01e863a790 vdev_disk_off_notify+0x77(ffffff43202e0b88, fffffffffbc997c0, ffffff431f509d40, 0)
ffffff01e863a820 ldi_invoke_notify+0x1b1(ffffff431ebe2550, fffffffffffffffe, 0, fffffffffbb5b228, 0)
ffffff01e863a860 e_ddi_offline_notify+0x75(ffffff431ebe2550)
ffffff01e863a8d0 devi_detach_node+0x52(ffffff431ebe2550, 40001)
ffffff01e863a940 ndi_devi_offline+0x89(ffffff431ebe2550, 1)
ffffff01e863a9a0 mptsas_offline_lun+0xd3(ffffff431ebe27f8, ffffff431ebe2550, 0, 1)
ffffff01e863aa00 mptsas_offline_target+0xb8(ffffff431ebe27f8, ffffff5dd58ca440)
ffffff01e863aaf0 mptsas_handle_topo_change+0x3e9(ffffffcd33c88800, ffffff431ebe27f8)
ffffff01e863ab60 mptsas_handle_dr+0x184(ffffffcd33c88800)
ffffff01e863ac20 taskq_thread+0x2d0(ffffff431d3c4d68)
ffffff01e863ac30 thread_start+8()
> ffffff4447a3e9e0::print spa_t
{
spa_name = [ '@', '\305', 'c', '\034', 'C', '\377', '\377', '\377', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', 'p', '\364', '\210', '\350', '\377',
'\377', '\037', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', ... ]
spa_comment = 0x1fffffe888f47000
spa_avl = {
avl_child = [ 0, 0 ]
avl_pcb = 0xffffff4400000001
}
.......
snipped for clarity
spa_async_lock = {
_opaque = [ 0x1fffffe888f47000 ]
}
Is that a mutex data structure and if so how do i check it's valid ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment