Skip to main content

Crash dump analysis on HP-UX

A crashing Unix server should be a seldom event, which means that postmortem investigation is something you will rarely do. Kernel debuggers are not much fun, and require you basically to have a good knowledge about the kernel internals. Not too difficult if you're a guru in a specific Unix flavour, but if you're housing 3 Unices, each with different kernel versions, then you're into a whole different game ! Luckily, there are admin-friendly scripts nowadays which help you with the task of digging out why your machine crashed.

Let's have a look at HP-UX : this features the adb kernel debugger, but also the Q4 package. This will generally be default installed in the /usr/contrib/Q4 directory. Before first use, you need to copy the initializing script to your homedir :

cp /usr/contrib/Q4/lib/q4lib/sample.q4rc.pl /root/.q4rc.pl

Then you're ready to start up the tool itself :

# q4 -p


HP KWDB 3.2.3 for HP Itanium (32 or 64 bit) and target HP-IPF 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard KWDB 3.2.3 12-May-2009 21:15 is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
crashdump information:
hostname anduril
model ia64 hp server Integrity Virtual Machine
panic gexcp_hndlr: Unresolved priv 0 interruption.
release @(#) $Revision: vmunix: B.11.31_LR FLAVOR=perf
dumptime 1277458512 Fri Jun 25 11:35:12 METDST 2010
savetime 1277459539 Fri Jun 25 11:52:19 METDST 2010
dumptype Non Compressed


Event selected is 0. It was a panic
#0 0xe000000001da26c0:0 in panic_save_regs_switchstack+0x110
(0x4000000000000692, 0xe000000001d9d640, 0x144000206c61009f

The Q4 package contains lots of scripts which can be used for providing you with extra information. The most interesting ones are analyze.pl and whathappened.pl. Beware that these scripts can barf out loads of output ! (you can always redirect the output to a file, as if you were on the command prompt)

q4> include analyze.pl
q4> include whathappened.pl
q4> run Analyze AMUP
...
q4> run WhatHappened
System Name: HP-UX
Node Name: anduril
Release: B.11.31
Version: U
Model: no
Machine ID: 123456789
Processors: 1
Architecture: IA-64
Physical Mem: 1571536 pages


This is a 64 Bit Kernel
The system had been up for 44.12 days (381190776 ticks).
Load averages: 0.76 0.77 0.48.


System went down at: Fri Jun 25 11:35:12 2010


+--------------------------------------------+

+--------------------------------------------+
Found adjacent data tr. Growing size. 0x240d000 -> 0x640d000.
Loaded ACPI revision 2.0 tables.
MMIO on this platform supports Write Coalescing.
...
gexcp_hndlr: Reserved Register/Field or Unimplemented Address fault occurs in kernel mode.
gexcp_hndlr: unimplemented data address fault, ISR.ir = 0,
data memory reference to unimplemented address
******************************************************************************


reg_dump(): Displaying register values (in hex) from the save state at
ssp 87ffffff_5ffe7200 return_status/reason/flags 0000/0054/00000001


Interruption type: Unimplemented Data Address Fault
panic: gexcp_hndlr: Unresolved priv 0 interruption.


Stack Trace:
IP Function Name
0xe000000001dea710 gexcp_hndlr+0x2d0
0xe000000001c0a780 bubbledown+0x0
0xe000000000afed90 kmem_lpc_alloc+0x2b0
0xe000000000d6ead0 get_kmem+0x290
0xe000000000d66070 kmem_arena_xlarge_alloc+0x2f0
0xe000000000c24e90 kmem_arena_varalloc+0x2d0
0xe000000000df31c0 vfork_buffer_init+0xb0
0xe000000000d0b7c0 newproc+0x11f0
0xe000000000a12930 vfork+0x1440
0xe000000000c261a0 syscall+0x560
End of Stack Trace

It's not always guaranteed that you'll find an exact reason why the machine crashed (especially if it's really kernel related), but at least it can give you a rough idea what happened.