Re: unusual ZFS issue
- In reply to: Miroslav Lachman : "Re: unusual ZFS issue"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 15 Dec 2023 06:41:22 UTC
Native encryption decryption errors won't show up as r/w/c errors, but will show up as "things with errors" in the status output. That wouldn't be triggered by scrub noticing them, though, since scrub doesn't decrypt things. Just the only thing I know of offhand where it'll decide there are errors but the counters will be zero... - Rich On Thu, Dec 14, 2023 at 7:05 PM Miroslav Lachman <000.fbsd@quip.cz> wrote: > On 14/12/2023 22:17, Lexi Winter wrote: > > hi list, > > > > i’ve just hit this ZFS error: > > > > # zfs list -rt snapshot data/vm/media/disk1 > > cannot iterate filesystems: I/O error > > NAME USED AVAIL > REFER MOUNTPOINT > > data/vm/media/disk1@autosnap_2023-12-13_12:00:00_hourly 0B - > 6.42G - > > data/vm/media/disk1@autosnap_2023-12-14_10:16:00_hourly 0B - > 6.46G - > > data/vm/media/disk1@autosnap_2023-12-14_11:17:00_hourly 0B - > 6.46G - > > data/vm/media/disk1@autosnap_2023-12-14_12:04:00_monthly 0B - > 6.46G - > > data/vm/media/disk1@autosnap_2023-12-14_12:15:00_hourly 0B - > 6.46G - > > data/vm/media/disk1@autosnap_2023-12-14_13:14:00_hourly 0B - > 6.46G - > > data/vm/media/disk1@autosnap_2023-12-14_14:38:00_hourly 0B - > 6.46G - > > data/vm/media/disk1@autosnap_2023-12-14_15:11:00_hourly 0B - > 6.46G - > > data/vm/media/disk1@autosnap_2023-12-14_17:12:00_hourly 316K - > 6.47G - > > data/vm/media/disk1@autosnap_2023-12-14_17:29:00_daily 2.70M - > 6.47G - > > > > the pool itself also reports an error: > > > > # zpool status -v > > pool: data > > state: ONLINE > > status: One or more devices has experienced an error resulting in data > > corruption. Applications may be affected. > > action: Restore the file in question if possible. Otherwise restore the > > entire pool from backup. > > see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A > > scan: scrub in progress since Thu Dec 14 18:58:21 2023 > > 11.5T / 18.8T scanned at 1.46G/s, 6.25T / 18.8T issued at 809M/s > > 0B repaired, 33.29% done, 04:30:20 to go > > config: > > > > NAME STATE READ WRITE CKSUM > > data ONLINE 0 0 0 > > raidz2-0 ONLINE 0 0 0 > > da4p1 ONLINE 0 0 0 > > da6p1 ONLINE 0 0 0 > > da5p1 ONLINE 0 0 0 > > da7p1 ONLINE 0 0 0 > > da1p1 ONLINE 0 0 0 > > da0p1 ONLINE 0 0 0 > > da3p1 ONLINE 0 0 0 > > da2p1 ONLINE 0 0 0 > > logs > > mirror-2 ONLINE 0 0 0 > > ada0p4 ONLINE 0 0 0 > > ada1p4 ONLINE 0 0 0 > > cache > > ada1p5 ONLINE 0 0 0 > > ada0p5 ONLINE 0 0 0 > > > > errors: Permanent errors have been detected in the following files: > > > > (it doesn’t list any files, the output ends there.) > > > > my assumption is that this indicates some sort of metadata corruption > issue, but i can’t find anything that might have caused it. none of the > disks report any errors, and while all the disks are on the same SAS > controller, i would have expected controller errors to be flagged as CKSUM > errors. > > > > my best guess is that this might be caused by a CPU or memory issue, but > the system has ECC memory and hasn’t reported any issues. > > > > - has anyone else encountered anything like this? > > I've never seen "cannot iterate filesystems: I/O error". Could it be > that the system has too many snapshots / not enough memory to list them? > > But I have seen the pool report an error in an unknown file and not > shows any READ / WRITE / CKSUM errors. This is from my notes taken 10 > years ago: > > ============================= > # zpool status -v > > pool: tank > > state: ONLINE > > status: One or more devices has experienced an error resulting in data > > corruption. Applications may be affected. > > action: Restore the file in question if possible. Otherwise restore the > > entire pool from backup. > > see: http://www.sun.com/msg/ZFS-8000-8A > > scrub: none requested > > config: > > > > NAME STATE READ WRITE CKSUM > > tank ONLINE 0 0 0 > > raidz1 ONLINE 0 0 0 > > ad0 ONLINE 0 0 0 > > ad1 ONLINE 0 0 0 > > ad2 ONLINE 0 0 0 > > ad3 ONLINE 0 0 0 > > > > errors: Permanent errors have been detected in the following files: > > > > <0x2da>:<0x258ab13> > ============================= > > As you can see there are no CKSUM errors. There is something that should > be a path to filename: <0x2da>:<0x258ab13> > Maybe it was error in a snapshot which was already deleted? Just my guess. > I ran a scrub on that pool, it finished without any error and then the > status of the pool was OK. > Similar error reappeared after a month and then after about 6 month. The > machine had ECC RAM. After these 3 incidents, I never saw it again. I > still have this machine in working condition, just the disk drives were > replaced from 4x 1TB to 4x 4TB and then 4x 8TB :) > > Kind regards > Miroslav Lachman > > >