Re: ZFS checksum error on 2 disks of mirror

From: <freebsd_at_vanderzwan.org>
Date: Sat, 14 Jan 2023 15:12:42 UTC

> On 14 Jan 2023, at 15:57, milky india <milkyindia@gmail.com> wrote:
> 
> > Output of zpool status -v gives no read/write/cksum errors  but lists one file with an error.
> Had faced a similar issue, when I tried to delete the file the error still persisted, although realised it after a few shutdown cycles

For me after a scrub there was no more mention of a file with an error so I assume the error was transient.

> 
> >After running a scrub on the pool all seems to be well, no more files with errors.
> Please monitor if the error shows up again sometime soon. While I don't know what the issue is but zfs error no 97 seems like a serious bug. 
> 
Definitely keeping a close look for this.

> Is this a similar issue for which PR is open? https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268333 
> 

No panics on my system, it just kept running. And there is no way that I know of to repoduce it.

At the moment I suspect it was the power grid  issue we had the night that error was logged.
Large part of the city where I live had an outage after a fire in a substation.
I  only had a dip for about 1s when it happened but this server did need a reboot as it was unresponsive.

The time of the error roughly matches the time  they started restoring power to the affected parts of the city.
Maybe that created another event on the grid.

The server is not behind a UPS as power grid is usually very reliable here in the Netherlands.

	Paul

 
> On Fri, Jan 13, 2023, 19:35 <freebsd@vanderzwan.org <mailto:freebsd@vanderzwan.org>> wrote:
>> Hi,
>> I noticed zpool status gave an error for one of my pools.
>> Looking back in the logs I found thus:
>> 
>> Dec 24 00:58:39 freebsd ZFS[40537]: pool I/O failure, zpool=backuppool error=97
>> Dec 24 00:58:39 freebsd ZFS[40541]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJL4JYGp2 offset=1634427084800 size=53248
>> Dec 24 00:58:39 freebsd ZFS[40545]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJKNA9Gp2 offset=1634427084800 size=53248
>> 
>> These are 2 WD Red Plus 8TB drives (same age, same firmware, attached to same controller).
>> 
>> Looking back in the logs I found this occurred earlier without me noticing:
>> 
>> Aug  8 03:17:56 freebsd ZFS[12328]: pool I/O failure, zpool=backuppool error=97
>> Aug  8 03:17:56 freebsd ZFS[12332]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJL4JYGp2 offset=4056214130688 size=131072
>> Aug  8 03:17:56 freebsd ZFS[12336]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJKNA9Gp2 offset=4056214130688 size=131072
>> Aug  8 13:37:26 freebsd ZFS[22317]: pool I/O failure, zpool=backuppool error=97
>> Aug  8 13:37:26 freebsd ZFS[22321]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJKNA9Gp2 offset=4056214130688 size=131072
>> Aug  8 13:37:26 freebsd ZFS[22325]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJL4JYGp2 offset=4056214130688 size=131072
>> Aug  8 15:37:44 freebsd ZFS[24704]: pool I/O failure, zpool=backuppool error=97
>> Aug  8 15:37:44 freebsd ZFS[24708]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJL4JYGp2 offset=4056214130688 size=131072
>> Aug  8 15:37:44 freebsd ZFS[24712]: checksum mismatch, zpool=backuppool path=/dev/gpt/VGJKNA9Gp2 offset=4056214130688 size=131072
>> 
>> Output of zpool status -v gives no read/write/cksum errors  but lists one file with an error.
>> 
>> After running a scrub on the pool all seems to be well, no more files with errors.
>> 
>> System is a homebuilt with Asrock Rack C2550 board with 16 GB of ECC RAM
>> Any idea how I could get checksum errors on the identical block of 2 disks in a mirror ?
>> 
>> Regards,
>> 	Paul