automatic fsck on gmirror failure
Gunther Mayer
gunther.mayer at googlemail.com
Sun Feb 3 13:14:30 PST 2008
Hi there,
I have a RAID 1 mirror implemented with gmirror and we recently had some
power issues at our data centre which caused fsck to fail mysteriously.
The server lost power unexpectedly, then came back up again for a
minute, power died again and shortly after the next boot the following
appears in my /var/log/messages
Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: INCORRECT BLOCK
COUNT I=777684 (8 should be 0) (CORRECTED)
Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: CANNOT READ BLK:
12417184
Feb 2 05:20:19 myserver fsck: /dev/mirror/gm0s1f: UNEXPECTED SOFT
UPDATE INCONSISTENCY; RUN fsck MANUALLY.
gm0s1f is my /usr partition. This was followed by countless errors that
look like
Feb 2 05:20:38 myserver ad6: TIMEOUT - READ_DMA retrying (1 retry
left) LBA=29096879
Feb 2 05:20:43 myserver ad6: TIMEOUT - READ_DMA retrying (0 retries
left) LBA=29096879
Feb 2 05:20:48 myserver ad6: FAILURE - READ_DMA timed out LBA=29096879
Feb 2 05:20:48 myserver
g_vfs_done():mirror/gm0s1f[READ(offset=6357598208, length=16384)]error = 5
and with it went any sort of remote access to the box. We had to get
physical access, fsck -y and reboot for the machine to be put back into
service.
Now my question is: Why did fsck die on me? I thought in this day and
age file system corruptions caused by power failures are repaired
automatically upon reboot. Or is it possible that interrupting fsck
itself caused the problem when the system went down again after the very
brief uptime in between?
I am really concerned about this as this caused a lot of unnecessary
downtime and I really don't want this to ever happen again. I know,
solving the power issues is the real solution but I want my several
layers of peace of mind.
Oh, I run 6.2 RELEASE.
Gunther
More information about the freebsd-questions
mailing list