Skip to main content

Unkillable processes

An issue I lately encountered was that a collegue complained about several processes which kept hanging on a Solaris 10 machine. After investigation, processes like format, powermt and even a for diagnostics invoked dtrace kept hanging, and could not even be killed :



# pkill -9 format

2



In such cases, a good old truss session mostly explains what's going on; but in this case, truss came back with a quite peculiar message :

# truss -p 26632
truss: unanticipated system error: 26632
#
# pstack 26632
pstack: cannot examine 26632: unanticipated system error
#
# pfiles 26632
pfiles: unanticipated system error: 26632



In those cases, the only option you have is to rely on the kernel debugger to determine the cause :

# mdb -k
Loading modules: [ unix genunix specfs dtrace ufs sd pcisch md ip hook neti sctp arp usba fcp fctl ssd nca lofs zfs cpc fcip random crypto logindmux ptm nfs ipc ]
> ::pgrep format
S PID PPID PGID SID UID FLAGS ADDR NAME
R 1241 1 942 686 0 0x4a004900 000006001414c060 format
> 000006001414c060::thread
ADDR STATE FLG PFLG SFLG PRI EPRI PIL INTR
000006001414c060 inval/2000 1424 de50 0 0 0 0 n/a

stack pointer for thread 300012b7700: 2a10055cb01
[ 000002a10055cb01 cv_wait+0x38() ]
000002a10055cbb1 PowerSleep+0x14()
000002a10055cc71 PowerGetSema+0xe8()
000002a10055cd31 power_open+0x364()
000002a10055cea1 spec_open+0x4f8()
000002a10055cf61 fop_open+0x78()
000002a10055d011 vn_openat+0x500()
000002a10055d1d1 copen+0x260()
000002a10055d2e1 syscall_trap32+0xcc()



In this case, it was the PowerPath MPIO which was blocked on a semaphore. Further investigation revealed that the drivers for PowerPath were removed from the /etc/system file. Restoring the correct version of that file and a reboot solved the problem.