We've tackled previously how to look at kernel dumps on HP-UX, let's have a look now how to perform them same on OpenSolaris. The kernel debugger is actually 'quite' user-friendly, and gives you mostly enough information how to handle a crash. If your Solaris is too stable to generate crashes, then use the
command to generate one on the fly. This will generate a dump in /var/adm/crash. Let's have a look at it with mdb :
# mdb -k unix.0 vmcore.0 Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs ip sctp usba lofs zfs random ipc md fcip fctl fcp crypto logindmux ptm nfs ] >
The ::status command will display high level information regarding this debugging session. This is mostly a one-liner, which reveals the reason of the crash.
> ::status debugging crash dump vmcore.0 (64-bit) from hostname operating system: 5.11 snv_43 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference dump content: kernel pages only
The ::stack command will prove you with a stack trace, this is the same thing trace you would have seen in syslog or the console.
> ::stack atomic_add_32() nfs_async_inactive+0x55(fffffe820d128b80, 0, ffffffffeff0ebcb) nfs3_inactive+0x38b(fffffe820d128b80, 0) fop_inactive+0x93(fffffe820d128b80, 0) vn_rele+0x66(fffffe820d128b80) snf_smap_desbfree+0x78(fffffe8185e2ff60) dblk_lastfree_desb+0x25(fffffe817a30f8c0, ffffffffac1d7cc0) dblk_decref+0x6b(fffffe817a30f8c0, ffffffffac1d7cc0) freeb+0x89(fffffe817a30f8c0) tcp_rput_data+0x215f(ffffffffb4af7140, fffffe812085d780, ffffffff993c3c00) squeue_enter_chain+0x129(ffffffff993c3c00, fffffe812085d780, fffffe812085d780, 1, 1) ip_input+0x810(ffffffffa23eec68, ffffffffaeab8040, fffffe812085d780, e)
The ::msgbuf command will output the message buffer at the time of crash; the message buffer is most commonly used by sysadmins through the "dmesg" command.
> ::msgbuf MESSAGE .... WARNING: IP: Hardware address '00:14:4f:xxxxxxx' trying to be our address xxxx WARNING: IP: Hardware address '00:14:4f:xxxx' trying to be our address xxxx panic[cpu0]/thread=fffffe80000adc80: BAD TRAP: type=e (#pf Page fault) rp=fffffe80000ad3d0 addr=0 occurred in module "unix" due to a NULL pointer dereference sched: #pf Page fault Bad kernel fault at addr=0x0
One of the coolest commands is the cpuinfo -v command, which will show more information about the running processes at the time of the crash, including some nicely ascii-art style formatting :
> ::cpuinfo -v ID ADDR FLG NRUN BSPL PRI RNRN KRNRN SWITCH THREAD PROC 1 ffffffff983b3800 1f 1 0 59 yes no t-0 fffffe80daac2f20 smtpd | | RUNNING PRI THREAD PROC READY 99 fffffe8000bacc80 sched QUIESCED EXISTS ENABLE
Other interesting commands are the ::ps (info about running processes), and ::panicinfo, which will reveal thread information, which you can further investigate with the ::walkthread option.
In a following article, I'll write about the Solaris Core Analyzer, which is a Q4 comparabe tool on Solaris to walk through kernel dumps.
- Log in to post comments