ZFS weird device tasting loop since MFC
Ulrich Spörlein
uqs at spoerlein.net
Fri Jun 5 08:44:26 UTC 2009
On Tue, 02.06.2009 at 11:24:08 +0200, Ulrich Spörlein wrote:
> On Tue, 02.06.2009 at 11:16:10 +0200, Ulrich Spörlein wrote:
> > Hi all,
> >
> > so I went ahead and updated my ~7.2 file server to the new ZFS goodness,
> > and before running any further tests, I already discovered something
> > weird and annoying.
> >
> > I'm using a mirror on GELI, where one disk is usually *not* attached as
> > a means of poor man's backup. (I had to go that route, as send/recv of
> > snapshots frequently deadlocked the system, whereas a mirror scrubbing
> > did not)
> >
> > root at coyote:~# zpool status
> > pool: tank
> > state: DEGRADED
> > status: The pool is formatted using an older on-disk format. The pool can
> > still be used, but some features are unavailable.
> > action: Upgrade the pool using 'zpool upgrade'. Once this is done, the
> > pool will no longer be accessible on older software versions.
> > scrub: none requested
> > config:
> >
> > NAME STATE READ WRITE CKSUM
> > tank DEGRADED 0 0 0
> > mirror DEGRADED 0 0 0
> > ad4.eli ONLINE 0 0 0
> > 12333765091756463941 REMOVED 0 0 0 was /dev/da0.eli
> >
> > errors: No known data errors
> >
> > When imported, there is a constant "tasting" of all devices in the system,
> > which also makes the floppy drive go spinning constantly, which is really
> > annoying. It did not do this with the old ZFS, are there any remedies?
> >
> > gstat(8) is displaying the following every other second, together with a
> > spinning fd0 drive.
> >
> > dT: 1.010s w: 1.000s filter: ^...$
> > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name
> > 0 0 0 0 0.0 0 0 0.0 0.0| fd0
> > 0 8 8 1014 0.1 0 0 0.0 0.1| md0
> > 0 32 32 4055 9.2 0 0 0.0 29.2| ad0
> > 0 77 10 1267 7.1 63 1125 2.3 31.8| ad4
> >
> > There is no activity going on, especially md0 is for /tmp, yet it
> > constantly tries to read stuff from everywhere. I will now insert the
> > second drive and see if ZFS shuts up then ...
>
> It does, but it also did not start resilvering the second disk:
>
> root at coyote:~# zpool status
> pool: tank
> state: ONLINE
> status: One or more devices has experienced an unrecoverable error. An
> attempt was made to correct the error. Applications are unaffected.
> action: Determine if the device needs to be replaced, and clear the errors
> using 'zpool clear' or replace the device with 'zpool replace'.
> see: http://www.sun.com/msg/ZFS-8000-9P
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 0
> mirror ONLINE 0 0 0
> ad4.eli ONLINE 0 0 0
> da0.eli ONLINE 0 0 16
>
> errors: No known data errors
>
> Will now run the scrub and report back in 6-9h.
Another datapoint: While the floppy-tasting has stopped, since the mirror sees
all devices again, there is some other problem here:
root at coyote:/# zpool online tank da0.eli
root at coyote:/# zpool status
pool: tank
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror ONLINE 0 0 0
ad4.eli ONLINE 0 0 0 684K resilvered
da0.eli ONLINE 0 0 0 2.20M resilvered
errors: No known data errors
root at coyote:/# zpool offline tank da0.eli
root at coyote:/# zpool status
pool: tank
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror DEGRADED 0 0 0
ad4.eli ONLINE 0 0 0 684K resilvered
da0.eli OFFLINE 0 0 0 2.20M resilvered
errors: No known data errors
root at coyote:/# zpool status
pool: tank
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror DEGRADED 0 0 0
ad4.eli ONLINE 0 0 0 684K resilvered
da0.eli OFFLINE 0 339 0 2.20M resilvered
errors: No known data errors
root at coyote:/# zpool status
pool: tank
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: resilver completed after 0h0m with 0 errors on Fri Jun 5 10:21:36 2009
config:
NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
mirror DEGRADED 0 0 0
ad4.eli ONLINE 0 0 0 684K resilvered
da0.eli OFFLINE 0 0 0 2.20M resilvered
errors: No known data errors
So I ran 'zpool status' thrice after the offline, and the second one reports
write errors on the OFFLINE device (WTF?). Running zpool status in a loop, this
will constantly show up and then vanish again.
I also get constant write requests to the remaining device, even though no
applications are accessing it. What the hell is ZFS trying to do here?
root at coyote:/# zpool iostat 1
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
tank 883G 48.4G 8 246 56.8K 1.53M
tank 883G 48.4G 8 249 55.9K 1.55M
tank 883G 48.4G 8 250 55.0K 1.54M
tank 883G 48.4G 8 252 54.1K 1.56M
tank 883G 48.4G 8 254 53.3K 1.57M
tank 883G 48.4G 8 253 52.5K 1.56M
tank 883G 48.4G 7 255 51.7K 1.57M
^C
Again, WTF? Can someone please enlighten me here?
Cheers,
Ulrich Spörlein
--
http://www.dubistterrorist.de/
More information about the freebsd-stable
mailing list