Skip to main content

Solaris' format coredumps on EFI-labelled disks

I tracked a nasty problem on a Solaris 10 host, which refused to start up format after adding extra disks :

# format
Segmentation fault. Core dumped.

Tracing the format command revealed that it barfed while reading a specific disk. Using format with specific disks worked flawlessly, but with that one disk, format kept segfaulting :

# format c2t1d42
Segmentation fault. Core dumped.

Digging further into the dump, it revealed the true reason why Solaris' format was crashing :

# pstack format.s0001620.6924.core
core 'format.s0001620.6924.core' of 6924: format -d c2t1d42
ff2542ec _malloc_unlocked (3c78, 5f188, 5f188, 5f188, 0, 0) + 22c
ff2540a4 malloc (3c78, 1, 94224, 0, ff2e8284, ff2f09b0) + 4c
ff240e68 calloc (1, 1, 3c78, 0, 0, 0) + 58
ff350aac efi_alloc_and_read (5, ffbfe64c, 3c00, 1356c, 0, ff364000) + 24
0001f6c4 read_efi_label (5, ffbfe6b8, 0, 0, d, 543f0) + 8
0002d8c0 ???????? (ff315a12, ffbfee68, ffbfe8ec, 5, 54360, 0)

Apparently, format was choking while trying to read an efi label. An octal dump of the disk also revealed the label :


0000000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
*
3140000 I A 6 4 _ E F I 0 0 0 b 0 0 0 0
3140020 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0



Fact is that Solaris cannot cope with (probably corrupt) EFI labels; there are some patches lying around for x86, but this is a sparc machine, with no patches apparently :
"It's a nasty series of bugs that apparently have been lingering around since 1993, and were never completely fixed. Nowadays they are obscure bugs, and if you look into bugs.opensolaris.org, for most of them there is no fix, on SunSolve there is no patch, etcetera, etcetera."