ZFS pool permanent error question -- errors: Permanent errors have been detected in the following files: storage: <0x0>
Anders Jensen-Waud
anders at jensenwaud.com
Mon Jun 16 02:49:55 UTC 2014
On Sun, Jun 15, 2014 at 05:10:52PM -0400, kpneal at pobox.com wrote:
> On Sun, Jun 15, 2014 at 03:04:16PM +1000, Anders Jensen-Waud wrote:
> > Hi all,
> >
> > My main zfs storage pool (named ``storage'') has recently started
> > displaying a very odd error:
> >
> > root at beastie> zpool status -v
> > /
> >
> > pool: backup
> > state: ONLINE
> > scan: none requested
> > config:
> > NAME STATE READ WRITE CKSUM
> > backup ONLINE 0 0 0
> > da1 ONLINE 0 0 0
> > errors: No known data errors
> > pool: storage
> > state: ONLINE
> > status: One or more devices has experienced an error resulting in data
> > corruption. Applications may be affected.
> > action: Restore the file in question if possible. Otherwise restore the
> > entire pool from backup.
> > see: http://illumos.org/msg/ZFS-8000-8A
> > scan: scrub in progress since Sun Jun 15 14:18:45 2014
> > 34.3G scanned out of 839G at 19.3M/s, 11h50m to go
> > 72K repaired, 4.08% done
> > config:
> > NAME STATE READ WRITE CKSUM
> > storage ONLINE 0 0 0
> > da0 ONLINE 0 0 0 (repairing)
> >
> > errors: Permanent errors have been detected in the following files:
> > storage:<0x0>
>
> I'm not sure what causes ZFS to lose the filename like this. I'll let
> someone else comment. I want to say you have a corrupt file in a snapshot,
> but don't hold me to that.
>
> It looks like you are running ZFS with pools consisting of a single disk.
> In cases like this if ZFS detects that a file has been corrupted ZFS is
> unable to do anything to fix it. Run with the option "copies=2" to have
> two copies of every file if you want ZFS to be able to fix broken files.
> Of course, this doubles the amount of space you will use, so you have to
> think about how important your data is to you.
Thank you for the tip. I didn't know about copies=2, so I will
definitely consider that option.
I am running ZFS on a single disk -- a 1 TB USB drive -- attached to my
"server" at home. It is not exactly an enterprise server, but it fits
well for my home purposes, namely file backup from my different
computers. On a nightly basis I then copy and compress the data sets
from storage to another USB drive to have a second copy. In this
instance, the nightly backup script (zfs send/recv based) hadn't run
properly so I had no backup to recover from.
Given that my machine only has 3 GB RAM, I was wondering if the issue
might be memory related and if I am better off converting the volume
back to UFS. I am keen to stay on ZFS to benefit from snapshots,
compression, security etc. Any thoughts?
>
> I don't know what caused the corrupt file. It could be random chance, or
> it could be that you accidentally did something to damage the pool. I say
> that because:
>
> > da1 at umass-sim1 bus 1 scbus4 target 0 lun 0
> > da1: <Seagate FreeAgent Go 102D> Fixed Direct Access SCSI-4 device
> > da1: Serial Number 2GE1GTVM
> > da1: 40.000MB/s transfers
> > da1: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
> > da1: quirks=0x2<NO_6_BYTE>
> > GEOM: da1: the primary GPT table is corrupt or invalid.
> > GEOM: da1: using the secondary instead -- recovery strongly advised.
> > GEOM: diskid/DISK-2GE1GTVM: the primary GPT table is corrupt or invalid.
> > GEOM: diskid/DISK-2GE1GTVM: using the secondary instead -- recovery
> > strongly advised.
>
> You've got something going on here. Did you GPT partition the disk? The
> zpool status you posted says you built your pools on the entire disk and
> not inside a partition. But GEOM is saying the disk has been partitioned.
> GPT stores data at both the beginning and end of the disk. ZFS may have
> trashed the beginning of the disk but not gotten to the end yet.
This disk is not the ``storage'' zpool -- it is my ``backup'' pool,
which is on a different drive:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
backup 464G 235G 229G 50% 1.00x ONLINE -
storage 928G 841G 87.1G 90% 1.00x ONLINE -
Running 'gpt recover /dev/da1' fixes the error above but after a reboot
it reappears. Would it be better to completely wipe the disk and
reinitialise it with zfs?
Miraculously, an overnight 'zpool scrub storage' has wiped out the errors
from yesterday, and I am puzzled why that is the case. As per the
original zpool status from yesterday, ZFS warned that I needed to
recover all the files from backup
aj at beastie> zpool status ~
pool: backup
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
backup ONLINE 0 0 0
da1 ONLINE 0 0 0
errors: No known data errors
pool: storage
state: ONLINE
scan: scrub repaired 984K in 11h37m with 0 errors on Mon Jun 16 01:55:48 2014
config:
NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
da0 ONLINE 0 0 0
errors: No known data errors
> Running ZFS in a partition or on the entire disk is fine either way. But
> you have to be consistent. Partitioning a disk and then writing outside
> of the partition creates errors like the above GEOM one.
Agree. In this instance it wasn't da0/storage, however.
> --
> Kevin P. Neal http://www.pobox.com/~kpn/
> "Not even the dumbest terrorist would choose an encryption program that
> allowed the U.S. government to hold the key." -- (Fortune magazine
> is smarter than the US government, Oct 29 2001, page 196.)
--
Anders Jensen-Waud
E: anders at jensenwaud.com
More information about the freebsd-fs
mailing list