automatic fsck on gmirror failure

Sun Feb 3 13:14:30 PST 2008

Hi there,

I have a RAID 1 mirror implemented with gmirror and we recently had some 
power issues at our data centre which caused fsck to fail mysteriously. 
The server lost power unexpectedly, then came back up again for a 
minute, power died again and shortly after the next boot the following 
appears in my /var/log/messages

    Feb  2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: INCORRECT BLOCK 
COUNT I=777684 (8 should be 0) (CORRECTED)
    Feb  2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: CANNOT READ BLK: 
12417184
    Feb  2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: UNEXPECTED SOFT 
UPDATE INCONSISTENCY; RUN fsck MANUALLY.

gm0s1f is my /usr partition. This was followed by countless errors that 
look like

    Feb  2 05:20:38 myserver ad6: TIMEOUT - READ_DMA retrying (1 retry 
left) LBA=29096879
    Feb  2 05:20:43 myserver ad6: TIMEOUT - READ_DMA retrying (0 retries 
left) LBA=29096879
    Feb  2 05:20:48 myserver ad6: FAILURE - READ_DMA timed out LBA=29096879
    Feb  2 05:20:48 myserver 
g_vfs_done():mirror/gm0s1f[READ(offset=6357598208, length=16384)]error = 5

and with it went any sort of remote access to the box. We had to get 
physical access, fsck -y and reboot for the machine to be put back into 
service.

Now my question is: Why did fsck die on me? I thought in this day and 
age file system corruptions caused by power failures are repaired 
automatically upon reboot. Or is it possible that interrupting fsck 
itself caused the problem when the system went down again after the very 
brief uptime in between?

I am really concerned about this as this caused a lot of unnecessary 
downtime and I really don't want this to ever happen again. I know, 
solving the power issues is the real solution but I want my several 
layers of peace of mind.

Oh, I run 6.2 RELEASE.

Gunther