Hello, world ! Welcome to the weblog of Kristof Willen. This is the place where I publish some weird and interesting links I encountered during my dwellings in cyberspace. Apart from that, you can find some useful/useless information about myself.
A crashing Unix server should be a seldom event, which means that postmortem investigation is something you will rarely do. Kernel debuggers are not much fun, and require you basically to have a good knowledge about the kernel internals. Not too difficult if you're a guru in a specific Unix flavour, but if you're housing 3 Unices, each with different kernel versions, then you're into a whole different game ! Luckily, there are admin-friendly scripts nowadays which help you with the task of digging out why your machine crashed.
Let's have a look at HP-UX : this features the adb kernel debugger, but also the Q4 package. This will generally be default installed in the /usr/contrib/Q4 directory. Before first use, you need to copy the initializing script to your homedir :
cp /usr/contrib/Q4/lib/q4lib/sample.q4rc.pl /root/.q4rc.pl
Then you're ready to start up the tool itself :
# q4 -p
HP KWDB 3.2.3 for HP Itanium (32 or 64 bit) and target HP-IPF 11.2x.
Copyright 1986 - 2001 Free Software Foundation, Inc.
Hewlett-Packard KWDB 3.2.3 12-May-2009 21:15 is covered by the
GNU General Public License. Type "show copying" to see the conditions to
change it and/or distribute copies. Type "show warranty" for warranty/support.
..
crashdump information:
hostname anduril
model ia64 hp server Integrity Virtual Machine
panic gexcp_hndlr: Unresolved priv 0 interruption.
release @(#) $Revision: vmunix: B.11.31_LR FLAVOR=perf
dumptime 1277458512 Fri Jun 25 11:35:12 METDST 2010
savetime 1277459539 Fri Jun 25 11:52:19 METDST 2010
dumptype Non Compressed
Event selected is 0. It was a panic
#0 0xe000000001da26c0:0 in panic_save_regs_switchstack+0x110
(0x4000000000000692, 0xe000000001d9d640, 0x144000206c61009f
The Q4 package contains lots of scripts which can be used for providing you with extra information. The most interesting ones are analyze.pl and whathappened.pl. Beware that these scripts can barf out loads of output ! (you can always redirect the output to a file, as if you were on the command prompt)
q4> include analyze.pl
q4> include whathappened.pl
q4> run Analyze AMUP
...
q4> run WhatHappened
System Name: HP-UX
Node Name: anduril
Release: B.11.31
Version: U
Model: no
Machine ID: 123456789
Processors: 1
Architecture: IA-64
Physical Mem: 1571536 pages
This is a 64 Bit Kernel
The system had been up for 44.12 days (381190776 ticks).
Load averages: 0.76 0.77 0.48.
System went down at: Fri Jun 25 11:35:12 2010
+--------------------------------------------+
| Message Buffer |
+--------------------------------------------+
Found adjacent data tr. Growing size. 0x240d000 -> 0x640d000.
Loaded ACPI revision 2.0 tables.
MMIO on this platform supports Write Coalescing.
...
gexcp_hndlr: Reserved Register/Field or Unimplemented Address fault occurs in kernel mode.
gexcp_hndlr: unimplemented data address fault, ISR.ir = 0,
data memory reference to unimplemented address
******************************************************************************
reg_dump(): Displaying register values (in hex) from the save state at
ssp 87ffffff_5ffe7200 return_status/reason/flags 0000/0054/00000001
Interruption type: Unimplemented Data Address Fault
panic: gexcp_hndlr: Unresolved priv 0 interruption.
Stack Trace:
IP Function Name
0xe000000001dea710 gexcp_hndlr+0x2d0
0xe000000001c0a780 bubbledown+0x0
0xe000000000afed90 kmem_lpc_alloc+0x2b0
0xe000000000d6ead0 get_kmem+0x290
0xe000000000d66070 kmem_arena_xlarge_alloc+0x2f0
0xe000000000c24e90 kmem_arena_varalloc+0x2d0
0xe000000000df31c0 vfork_buffer_init+0xb0
0xe000000000d0b7c0 newproc+0x11f0
0xe000000000a12930 vfork+0x1440
0xe000000000c261a0 syscall+0x560
End of Stack Trace
It's not always guaranteed that you'll find an exact reason why the machine crashed (especially if it's really kernel related), but at least it can give you a rough idea what happened.
I'm trying to install a reasonable young rrdtool onto a HP-UX 11.11, but I'm almost giving up in despair : there's a HP-UX depot in the contribute section of the rrdtool downloads, but that's a very old version. There are very few sites which offer prebuild HP-UX GNU binaries, but the HP-UX porting and archive center is the most well-known. Unfortunately, there are three current versions of HP-UX, spread over 2 architectures, which means that the archive is rather thin. A prebuild recent rrdtool version is unavailable, which implies I get the pleasure of building the thing.
HP-UX carries the css cc compiler, which dislikes rrdtool (or the other way around), so configure is barfing out the following :
configure: error: Your Compiler does not do proper IEEE math ...
Time to install gcc, but that means installing its dependancies too : libiconv, libgcc and zlib; after a successful gcc installation, time for a new configure run :
# export CC=gcc # ./configure [...] checking for gcc... gcc checking for C compiler default output file name... configure: error: C compiler cannot create executables See `config.log' for more details.
Hmmm, that's weird, let's check out gcc :
# gcc /usr/lib/dld.sl: Can't open shared library: /usr/local/lib/libintl.sl /usr/lib/dld.sl: No such file or directory Abort
That's even weirder; why's gcc linked to this library ? Let's double check :
# ldd `which gcc`
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/lib/libdld.2 => /usr/lib/libdld.2
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/local/lib/libiconv.sl => /usr/local/lib/libiconv.sl
/usr/lib/libc.2 => /usr/lib/libc.2
/usr/lib/dld.sl: Can't open shared library: /usr/local/lib/libintl.sl
/usr/lib/dld.sl: No such file or directory
But there's a libintl.sl lib in the /opt/gnome/lib dir, hopefully that one can be used :
# export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/gnome/lib sh: LD_LIBRARY_PATH: Parameter not set. # gcc /usr/lib/dld.sl: Can't open shared library: /usr/local/lib/libintl.sl /usr/lib/dld.sl: No such file or directory Abort
Damned, HP-UX won't accept the $LD_LIBRARY_PATH or $SHLIB parameter; maybe this dirty hack will work :
# ln -s /opt/gnome/lib/libintl.sl /usr/local/lib/libintl.sl # gcc /usr/lib/dld.sl: Unresolved symbol: libintl_bindtextdomain (code) from gcc Abort
/me kicks the server. Bah.
We coped with this crap on Linux 10 years ago. Maybe I'm with the ignorant, but does anyone knows a way out of this mess ? Sun is giving away companion CD's with GNU tools on it, maybe HP does the same ?
I spent the last week in the middle of the English countryside for a course of HP-UX Serviceguard.
Thick fog, Festive Feasants, Badgers, Guiness and heavy head aches. Luckily we travelled by EuroStar, cause Heathrow Airport was rather chaotic due to the fog.
There's a small collection of pictures I took with my mobile phone, so quality is not overall excellent. It offers a good, misty overview of a week with little or no distraction.
Another uptime record :
[root #/ ] w
9:23am up 1396 days, 17:21, 15 users, load average: 1.36, 1.07, 0.94
This uptime beats by far the previous record. This machine runs an old copy of HP/UX 10.20 and is still fairly used, too. To be fair, I don't like machines with such big uptimes : they're old, have a non-standard setup and configuration and go mostly unpatched through life.
'Hewlett-Packard finally offers 24x7 support for Debian GNU/Linux with HP Extensions. In an article Chris DiBona highlighted the services offered by GNU/Linux vendors and pointed out that their repositories are miles ahead of competing proprietary commercial offerings.