Re: ZFS checksum error on 2 disks of mirror
- In reply to: freebsd_a_vanderzwan.org: "Re: ZFS checksum error on 2 disks of mirror"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 14 Jan 2023 15:42:48 UTC
> Scrub is finding no errors so I think the pool and data should be healthy. Yes that's what I assumed as well only to later discover it wasn't ok. >Scrubbing all pools roughly every 4 weeks so I’ll notice if that changes. Would probably do it sooner and a couple of scrubs across a couple of reboots , just to be doubly sure. I hope nothing bad comes of it and you have your peace of mind later. PS: Sorry if it feels like I'm insisting but had a bad experience with this bug. On Sat, Jan 14, 2023, 19:36 <freebsd@vanderzwan.org> wrote: > Hi > > > On 14 Jan 2023, at 16:29, milky india <milkyindia@gmail.com> wrote: > > > No panics on my system, it just kept running. And there is no way that I > know of to repoduce it. > > Yes (not being able to) reproducing issues is a huge problem. > When the scrub was producing the error do you remember the exact error > message or have it recorded? > > > Scrub did not give any errors. Zpool status -v showed one file with an > error but that was also gone after the scrub. > So no evidence of any error except for what was logged in > /var/log/messages remains. > > In this case it was a meta data level corruption error that lead to > https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A/ which seemed like > a dead end, or in your case at least ensuring things are backed up in case > the issue arises later. > > > Scrub is finding no errors so I think the pool and data should be healthy. > > Scrubbing all pools roughly every 4 weeks so I’ll notice if that changes. > > Paul > > Ultimately if its zfs > On Sat, Jan 14, 2023, 19:13 <freebsd@vanderzwan.org> wrote: > >> >> >> On 14 Jan 2023, at 15:57, milky india <milkyindia@gmail.com> wrote: >> >> > Output of zpool status -v gives no read/write/cksum errors but lists >> one file with an error. >> Had faced a similar issue, when I tried to delete the file the error >> still persisted, although realised it after a few shutdown cycles >> >> >> For me after a scrub there was no more mention of a file with an error so >> I assume the error was transient. >> >> >> >After running a scrub on the pool all seems to be well, no more files >> with errors. >> Please monitor if the error shows up again sometime soon. While I don't >> know what the issue is but zfs error no 97 seems like a serious bug. >> >> Definitely keeping a close look for this. >> >> Is this a similar issue for which PR is open? >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268333 >> >> >> No panics on my system, it just kept running. And there is no way that I >> know of to repoduce it. >> >> At the moment I suspect it was the power grid issue we had the night >> that error was logged. >> Large part of the city where I live had an outage after a fire in a >> substation. >> I only had a dip for about 1s when it happened but this server did need >> a reboot as it was unresponsive. >> >> The time of the error roughly matches the time they started restoring >> power to the affected parts of the city. >> Maybe that created another event on the grid. >> >> The server is not behind a UPS as power grid is usually very reliable >> here in the Netherlands. >> >> Paul >> >> >> >> On Fri, Jan 13, 2023, 19:35 <freebsd@vanderzwan.org> wrote: >> >>> Hi, >>> I noticed zpool status gave an error for one of my pools. >>> Looking back in the logs I found thus: >>> >>> Dec 24 00:58:39 freebsd ZFS[40537]: pool I/O failure, zpool=backuppool >>> error=97 >>> Dec 24 00:58:39 freebsd ZFS[40541]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJL4JYGp2 offset=1634427084800 size=53248 >>> Dec 24 00:58:39 freebsd ZFS[40545]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJKNA9Gp2 offset=1634427084800 size=53248 >>> >>> These are 2 WD Red Plus 8TB drives (same age, same firmware, attached to >>> same controller). >>> >>> Looking back in the logs I found this occurred earlier without me >>> noticing: >>> >>> Aug 8 03:17:56 freebsd ZFS[12328]: pool I/O failure, zpool=backuppool >>> error=97 >>> Aug 8 03:17:56 freebsd ZFS[12332]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJL4JYGp2 offset=4056214130688 size=131072 >>> Aug 8 03:17:56 freebsd ZFS[12336]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJKNA9Gp2 offset=4056214130688 size=131072 >>> Aug 8 13:37:26 freebsd ZFS[22317]: pool I/O failure, zpool=backuppool >>> error=97 >>> Aug 8 13:37:26 freebsd ZFS[22321]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJKNA9Gp2 offset=4056214130688 size=131072 >>> Aug 8 13:37:26 freebsd ZFS[22325]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJL4JYGp2 offset=4056214130688 size=131072 >>> Aug 8 15:37:44 freebsd ZFS[24704]: pool I/O failure, zpool=backuppool >>> error=97 >>> Aug 8 15:37:44 freebsd ZFS[24708]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJL4JYGp2 offset=4056214130688 size=131072 >>> Aug 8 15:37:44 freebsd ZFS[24712]: checksum mismatch, zpool=backuppool >>> path=/dev/gpt/VGJKNA9Gp2 offset=4056214130688 size=131072 >>> >>> Output of zpool status -v gives no read/write/cksum errors but lists >>> one file with an error. >>> >>> After running a scrub on the pool all seems to be well, no more files >>> with errors. >>> >>> System is a homebuilt with Asrock Rack C2550 board with 16 GB of ECC RAM >>> Any idea how I could get checksum errors on the identical block of 2 >>> disks in a mirror ? >>> >>> Regards, >>> Paul >>> >> >> >