You are here

Solaris core dump analysis with SUNWscat

Topics: 
OpenSolaris

I've previously tackled how Solaris core dumps can be investigated with mdb. There's another utility (comparable with Q4 on HP_UX), called SUNWscat. Scat is a tool to analyze kernel dumps (pun probably intended). Just download the SUNWscat package, install it on your server, and wait for a kernel crash to happen. When this happened, you'll find a unix.0 and vmcore.0 in the coredump directory (default /var/crash). When you fire up SUNWscat, you'll be presented with the following screen :

# scat 0
  Solaris[TM] CAT 4.1 (build 526) for Solaris 10 64-bit SPARC(sun4u)

  Copyright © 2003 Sun Microsystems, Inc. All rights reserved.
  Patents Pending. Use is subject to license terms.
  Sun Microsystems proprietary - DO NOT RE-DISTRIBUTE!

opening vmcore.0 ...dumphdr...symtab...core...done
loading core data: modules...panic...memory...time...misc...done
loading stabs...read_type_db: Wrong number of lines in database, or database
doesn't end in a newline
unable to load any stabs file
patches... - NOT AVAILABLE (No such file or directory) done

core file:      /var/crash/vmcore.0
user:           Super-User (root:0)
release:        5.10 (64-bit)
version:        Generic_112233-11
machine:        sun4u
node name:      boson
domain:         arda.org
hw_provider:    Sun_Microsystems
system type:    SUNW,Sun-Fire-V210
hostid:         837844c7
time of crash:  Tue Apr 22 11:49:52 EDT 2008
age of system:  22 hours 5 minutes 4.48 seconds
panic cpu:      0 (ncpus: 8)
panic string:   free: freeing free block, dev:0x200000016e, block:32032, ino:6057255, 
                fs:/homes

running sanity checks.../etc/system...ndd...sysent...misc...done
SolarisCAT(vmcore.0)>

The first thing you probably want to do, is investigating the crash reason :

SolarisCAT(vmcore.0)> analyze
PANIC: free: freeing free block, dev:0x%lx, block:%ld, ino:%lu, fs:%s
[...]
==== printing for generic panic information ====
cpu 0 had the panic

==== panic thread: 0x2a1003f7d40 ==== cpu: 0 ====
==== panic kernel thread: 0x2a1003f7d40  pid: 0  on cpu: 0 ====
cmd: sched

t_stk: 0x2a1003f7b50  sp: 0x1437751  t_stkbase: 0x2a1003f4000
t_pri: 60(SYS)  pctcpu: 0.000000  t_lwp: 0x0
t_procp: 0x1438518(proc_sched)  p_as: 0x1438400(kas)
last cpuid: 0
idle: 50 ticks (0.50 seconds)
start: Mon Apr 21 13:45:07 2008
age: 79485 seconds (22 hours 4 minutes 45 seconds)
stime: 2132 (22 hours 4 minutes 43.16 seconds earlier)
tstate: TS_ONPROC - thread is being run on a processor
tflg:   T_TALLOCSTK - thread structure allocated from stk
        T_DONTBLOCK - for lockfs
        T_PANIC - thread initiated a system panic
tpflg:  none set
tsched: TS_LOAD - thread is in memory
        TS_DONT_SWAP - thread/LWP should not be swapped
        TS_SIGNALLED - thread was awakened by cv_signal()
pflag:  SSYS - system resident process
        SLOAD - in core
        SLOCK - process cannot be swapped

pc: 0x104a720   unix:panicsys+0x44:     call    unix:setjmp
startpc: 0x11a53f8      ufs:ufs_thread_delete+0x0:      save    %sp, -0xd0, %sp

unix:panicsys+0x44 (0x14a3158, 0x2a1003f74a0, 0x1438120, 0x1, 0x0, 0x0)
[...]
unix:thread_start+0x4 (0x3000026e828, 0x0, 0x0, 0x0, 0x0, 0x0)
-- end of kernel thread's stack --

SolarisCAT(vmcore.0)>

The proc command, for example, can tell you about the processes that were running at the time your system crashed. These processes are listed by default in reverse PID order.

SolarisCAT(vmcore.0)> proc

    addr       pid    ppid   uid      size      rss     swresv   time  command
------------- ------ ------ ------ ---------- -------- -------- ------ ---------
0x30003c8e040    283      1      0    3776512  1646592  1302528  90118 /usr/sbin/ssmon
0x30003c96a50    279      1      0    9306112  2514944  1769472     19 /usr/sbin/ssserver
0x30003bee030    256      1      0   27656192  2596864  1138688     57 /usr/sbin/nscd
0x30003c8ea58    243      1      0    2506752  1703936   466944      7 /usr/sbin/cron
0x30003c96038    240      1      0   18874368  2170880  2711552      7 /usr/sbin/syslogd
0x30000f60010    225      1      0    7217152  2400256  1146880    170 /usr/lib/autofs/automountd
0x300020c4a40    217      1      0    2260992  1572864   598016      3 /usr/lib/nfs/lockd
0x300020c5458    213      1      1    4677632  1974272   876544      2 /usr/lib/nfs/statd
0x300020c4028    201      1      0    2629632  2048000   835584     12 /usr/sbin/inetd -s
[...]

Advanced use of scat requires an in-depth understanding of the Solaris kernel. However, you can get a lot of useful information by using just the basic commands.

Comments

That's why I included this wonderfull tool in your default jumpstart profile :)

I believe this tool is wonderful, but how can I get this tool ?
It´s really dificult to find this tool.

The transition from Sun to Oracle made the previous sunsolve website kind of a mess. I would suggest to retrieve the tool from the OpenSolaris webservers.
Oracle itself removed the tool apparently...