Weird problem with gmirror - cannot add the Good disk when
previously failed SATA disk is online
Ivan Voras
ivoras at freebsd.org
Mon May 18 12:39:39 UTC 2009
panix panix wrote:
> Hello,
> in advance sorry for the cross posting, it is just that freebsd-geom didnt seem that populated.
> I run 7.1-PRERELEASE, its a home server.
> today morning after a power failure, the rebuild my root gm0 failed on disk ad4.
> The messages were:
>
> May 18 08:02:02 panix kernel: ad4: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=268091264
> May 18 08:02:08 panix kernel: drm0: <Intel i865G GMCH> on vgapci0
> May 18 08:02:08 panix kernel: info: [drm] AGP at 0xf0000000 128MB
> May 18 08:02:08 panix kernel: info: [drm] Initialized i915 1.5.0 20060119
> May 18 08:02:08 panix kernel: drm0: [ITHREAD]
> May 18 08:02:08 panix kernel: ad4: FAILURE - device detached
> May 18 08:02:08 panix kernel: subdisk4: detached
> May 18 08:02:08 panix kernel: ad4: detached
> May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: provider ad4 disconnected.
> May 18 08:02:08 panix kernel: GEOM_MIRROR: Device gm0: rebuilding provider ad4 stopped.
>
> I read http://www.eztiger.org/2008/08/removing-and-re-adding-a-disk-in-gmirror/
> hoping that the rebuld failure was temprary
> and so i tried to just run
> # gmirror forget gm0
> # gmirror insert gm0 ad4
>
> But the system responded (if i remember correctly)
> Unknown provider ad4.
> The system no longer could see ad4 being online.
Yes, as you were informed by the "device detached" message - after that
point the ad4 was removed from /dev.
> So i rebooted the system many times and had these results:
> -When having put offline ad4 (disconnected by hardware), the system booted ok.
> -When having both disks online the system responded consistently
> with:
> "GEOM_MIRROR: Cannot add disk ad6 to gm0 (error=22)."
Which means that gm0 was somehow created before - maybe from the "stale"
ad4 copy? If so, you are attempting to add a newer generation of data
(from ad6) to a gm0 instantiated from an older generation (from ad4).
This could explain the error code (22=invalid argument).
OTOH if you only have ad6 in the system this means you are trying to
insert ad6 into a mirror which is already instantiated by ad6 - which is
trivially wrong.
> Which IMO is not very ok, since gm0 should add ad6 without problem,
> no matter if ad4 is online or not.
You cannot really expect the system to behave correctly with broken
hardware.
> -When having only ad4 online, then it simply cannot find gm0 at all. (kind of reasonable)
Relatively. Is the ad4 recognized by the system? You didn't really clear
metadata on ad4 so it should be recognized, but as a stale version
(hopefully). If it isn't recognized at all, then it's broken.
> So my only option is to have only ad6 online, with a current gmirror status:
> panix# gmirror status
> Name Status Components
> mirror/gm0 COMPLETE ad6
This is ok.
> Anyone has an idea of how should i proceed (besides buying a UPS unit!)
> Is it meaningfull to go for a new Disk to replace current ad4?
Yes. Then proceed with gmirror insert.
> Why is the presence of the supposed bad disk ad4, affecting gm0,
> when having already told gm0 to forget about ad4?
It's relatively common (it was more common in the days of PATA cables)
to have a bad drive interfering with the rest of the system.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20090518/64a45fe7/signature.pgp
More information about the freebsd-questions
mailing list