You are here

Unkillable processes

Topics: 
OpenSolaris

An issue I lately encountered was that a collegue complained about several processes which kept hanging on a Solaris 10 machine. After investigation, processes like format, powermt and even a for diagnostics invoked dtrace kept hanging, and could not even be killed :

# pkill -9 format
# ps -ef |grep -c format
2

In such cases, a good old truss session mostly explains what's going on; but in this case, truss came back with a quite peculiar message :

# truss -p 26632
truss: unanticipated system error: 26632
#
# pstack 26632
pstack: cannot examine 26632: unanticipated system error
#
# pfiles 26632
pfiles: unanticipated system error: 26632

In those cases, the only option you have is to rely on the kernel debugger to determine the cause :

# mdb -k
Loading modules: [ unix genunix specfs dtrace ufs sd pcisch md ip hook neti sctp arp usba fcp fctl ssd nca lofs zfs cpc fcip random crypto logindmux ptm nfs ipc ]
> ::pgrep format
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R   1241      1    942    686      0 0x4a004900 000006001414c060 format
> 000006001414c060::thread
            ADDR    STATE  FLG PFLG SFLG   PRI  EPRI PIL             INTR
000006001414c060 inval/2000 1424 de50    0     0     0   0              n/a
> 000006001414c060::walk thread | ::findstack
stack pointer for thread 300012b7700: 2a10055cb01
[ 000002a10055cb01 cv_wait+0x38() ]
  000002a10055cbb1 PowerSleep+0x14()
  000002a10055cc71 PowerGetSema+0xe8()
  000002a10055cd31 power_open+0x364()
  000002a10055cea1 spec_open+0x4f8()
  000002a10055cf61 fop_open+0x78()
  000002a10055d011 vn_openat+0x500()
  000002a10055d1d1 copen+0x260()
  000002a10055d2e1 syscall_trap32+0xcc()

In this case, it was the PowerPath MPIO which was blocked on a semaphore. Further investigation revealed that the drivers for PowerPath were removed from the /etc/system file. Restoring the correct version of that file and a reboot solved the problem.