Crash dump analysis on HP-UX

A crashing Unix server should be a seldom event, which means that postmortem investigation is something you will rarely do. Kernel debuggers are not much fun, and require you basically to have a good knowledge about the kernel internals. Not too difficult if you're a guru in a specific Unix flavour, but if you're housing 3 Unices, each with different kernel versions, then you're into a whole different game ! Luckily, there are admin-friendly scripts nowadays which help you with the task of digging out why your machine crashed.

Let's have a look at HP-UX : this features the adb kernel debugger, but also the Q4 package. This will generally be default installed in the /usr/contrib/Q4 directory. Before first use, you need to copy the initializing script to your homedir :

cp /usr/contrib/Q4/lib/q4lib/sample.q4rc.pl /root/.q4rc.pl

Then you're ready to start up the tool itself :

# q4 -p

HP KWDB 3.2.3 for HP Itanium (32 or 64 bit) and target HP-IPF 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard KWDB 3.2.3 12-May-2009 21:15 is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
crashdump information:
  hostname  anduril
  model     ia64 hp server Integrity Virtual Machine
  panic     gexcp_hndlr: Unresolved priv 0 interruption.
  release   @(#) $Revision: vmunix:    B.11.31_LR FLAVOR=perf 
  dumptime  1277458512 Fri Jun  25 11:35:12 METDST 2010
  savetime  1277459539 Fri Jun  25 11:52:19 METDST 2010
  dumptype  Non Compressed 

Event selected is 0. It was a panic
#0  0xe000000001da26c0:0 in panic_save_regs_switchstack+0x110
    (0x4000000000000692, 0xe000000001d9d640, 0x144000206c61009f

The Q4 package contains lots of scripts which can be used for providing you with extra information. The most interesting ones are analyze.pl and whathappened.pl. Beware that these scripts can barf out loads of output ! (you can always redirect the output to a file, as if you were on the command prompt)

q4> include analyze.pl
q4> include whathappened.pl
q4> run Analyze AMUP
...
q4> run WhatHappened
System Name:    HP-UX
Node Name:      anduril
Release:        B.11.31
Version:        U
Model:          no
Machine ID:     123456789
Processors:     1
Architecture:   IA-64
Physical Mem:   1571536 pages

This is a 64 Bit Kernel
The system had been up for 44.12 days (381190776 ticks).
Load averages: 0.76 0.77 0.48.

System went down at: Fri Jun 25 11:35:12 2010

+--------------------------------------------+
| Message Buffer                             |
+--------------------------------------------+
Found adjacent data tr. Growing size.  0x240d000 -> 0x640d000.
Loaded ACPI revision 2.0 tables.
MMIO on this platform supports Write Coalescing.
...
gexcp_hndlr: Reserved Register/Field or Unimplemented Address fault occurs in kernel mode.
gexcp_hndlr: unimplemented data address fault, ISR.ir = 0,
      data memory reference to unimplemented address
******************************************************************************

reg_dump(): Displaying register values (in hex) from the save state at
  ssp  87ffffff_5ffe7200 return_status/reason/flags  0000/0054/00000001

Interruption type: Unimplemented Data Address Fault
panic: gexcp_hndlr: Unresolved priv 0 interruption.

Stack Trace:
  IP                  Function Name
  0xe000000001dea710  gexcp_hndlr+0x2d0
  0xe000000001c0a780  bubbledown+0x0
  0xe000000000afed90  kmem_lpc_alloc+0x2b0
  0xe000000000d6ead0  get_kmem+0x290
  0xe000000000d66070  kmem_arena_xlarge_alloc+0x2f0
  0xe000000000c24e90  kmem_arena_varalloc+0x2d0
  0xe000000000df31c0  vfork_buffer_init+0xb0
  0xe000000000d0b7c0  newproc+0x11f0
  0xe000000000a12930  vfork+0x1440
  0xe000000000c261a0  syscall+0x560
End of Stack Trace

It's not always guaranteed that you'll find an exact reason why the machine crashed (especially if it's really kernel related), but at least it can give you a rough idea what happened.

Anonymous Fri, 08/13/2010 - 14:50

Just wanted to point out that the command which is being run is actually `kwdb` (it is shown in the output). So the old `q4` simply runs kwdb:

# ll /usr/contrib/Q4/bin/q4
lrwxrwxrwx 1 bin bin 26 Jul 15 14:49 /usr/contrib/Q4/bin/q4 -> /usr/contrib/kwdb/bin/kwdb

`kwdb` combines some of the features of `wdb` (essentially GNU gdb modified by HP) with the old `q4` commands, together to make a full-featured kernel debugger. `kwdb` also supports Perl scripting.

Also, starting at version 11.31, HP added `livedump` which makes it possible to produce a full memory dump without actually panicing the system. And see the " /opt/sfm/tools/ " directory for `crashinfo` -- a live kernel stack trace tool, as well as `pstack`, its userspace counterpart, both included in 11.31. These tools provide a more convenient way to see what is happening in a live kernel.