Strange ZFS filesystem corruption
Paul Mather
paul at gromit.dlib.vt.edu
Mon Oct 3 19:19:32 UTC 2011
I wasn't sure whether to post this here or on stable at freebsd.org. The system now runs RELENG_9, but the ZFS pool exhibiting problems was created, IIRC, under 9-CURRENT. I believe RELENG_9 is sufficiently close to HEAD at this stage that this list is probably the correct place for this message.
I have a raidz2 ZFS pool on a system that I have recently been using as a mirror for about 6.5 TiB of data. The data are mirrored nightly using rsync. I noticed during these nightly rsync copies I would get some errors like this:
=====
file has vanished: "/backups/storage/san/DLA/DLA_Records/05DLAAdmin"
rsync: stat "/backups/storage/san/DLA/DLA_Records/05DLAAdmin" failed: No such file or directory (2)
rsync: recv_generator: mkdir "/backups/storage/san/DLA/DLA_Records/05DLAAdmin/05DI_business copy" failed: No such file or directory (2)
*** Skipping any contents from this failed directory ***
=====
It appears that 05DLAAdmin is a directory that is corrupted. It shows in an "ls" but any attempt to descend into that directory or discern its attributes fails with a "No such file or directory" error. Furthermore, I cannot delete this directory (even with "rm -rf"). E.g.:
=====
tape# pwd
/backups/storage/san/DLA
tape# whoami
root
tape# rm -rf DLA_Records
rm: DLA_Records/07DLAAdmin/07Digital_Imaging_Work: Directory not empty
rm: DLA_Records/07DLAAdmin/FY07IAWAprep: Directory not empty
rm: DLA_Records/07DLAAdmin: Directory not empty
rm: DLA_Records: Directory not empty
tape# cd DLA_Records
tape# ls
05DLAAdmin 07DLAAdmin
tape# ls -l
ls: 05DLAAdmin: No such file or directory
total 3
drwxrws--- 4 500 501 4 Oct 3 11:53 07DLAAdmin
tape# file 05DLAAdmin
05DLAAdmin: cannot open `05DLAAdmin' (No such file or directory)
tape# ls -R 07DLAAdmin
07Digital_Imaging_Work FY07IAWAprep
07DLAAdmin/07Digital_Imaging_Work:
ls: 07Proposals: No such file or directory
07DLAAdmin/FY07IAWAprep:
ls: Budget: No such file or directory
tape# ls 07DLAAdmin
07Digital_Imaging_Work FY07IAWAprep
tape# ls 07DLAAdmin/07Digital_Imaging_Work
07Proposals
tape# ls -l 07DLAAdmin/07Digital_Imaging_Work/07Proposals
ls: 07DLAAdmin/07Digital_Imaging_Work/07Proposals: No such file or directory
tape# ls 07DLAAdmin/FY07IAWAprep
Budget
tape# ls 07DLAAdmin/FY07IAWAprep/Budget
ls: 07DLAAdmin/FY07IAWAprep/Budget: No such file or directory
tape# file 07DLAAdmin/FY07IAWAprep/Budget
07DLAAdmin/FY07IAWAprep/Budget: cannot open `07DLAAdmin/FY07IAWAprep/Budget' (No such file or directory)
tape# cd 05DLAAdmin
05DLAAdmin: No such file or directory.
=====
The pool itself reports no errors. I performed a scrub on the pool yet this bizarre filesystem corruption persists:
=====
tape# zpool status backups
pool: backups
state: ONLINE
scan: scrub repaired 15K in 7h33m with 0 errors on Sat Oct 1 19:22:35 2011
config:
NAME STATE READ WRITE CKSUM
backups ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gpt/disk02 ONLINE 0 0 0
gpt/disk03 ONLINE 0 0 0
gpt/disk04 ONLINE 0 0 0
gpt/disk05 ONLINE 0 0 0
gpt/disk06 ONLINE 0 0 0
gpt/disk07 ONLINE 0 0 0
errors: No known data errors
tape# uname -a
FreeBSD tape.private.lib.vt.edu 9.0-BETA3 FreeBSD 9.0-BETA3 #0: Wed Sep 28 15:18:59 EDT 2011 pmather at tape.private.lib.vt.edu:/usr/obj/usr/src/sys/TAPE amd64
tape# zpool get all backups
NAME PROPERTY VALUE SOURCE
backups size 10.9T -
backups capacity 62% -
backups altroot - default
backups health ONLINE -
backups guid 1352318175125790395 default
backups version 28 default
backups bootfs - default
backups delegation on default
backups autoreplace off default
backups cachefile - default
backups failmode wait default
backups listsnapshots off default
backups autoexpand off default
backups dedupditto 0 default
backups dedupratio 1.00x -
backups free 4.07T -
backups allocated 6.80T -
backups readonly off -
tape# zfs get all backups/storage
NAME PROPERTY VALUE SOURCE
backups/storage type filesystem -
backups/storage creation Fri Sep 2 14:43 2011 -
backups/storage used 4.26T -
backups/storage available 2.60T -
backups/storage referenced 4.26T -
backups/storage compressratio 1.51x -
backups/storage mounted yes -
backups/storage quota none default
backups/storage reservation none default
backups/storage recordsize 128K default
backups/storage mountpoint /backups/storage default
backups/storage sharenfs off default
backups/storage checksum fletcher4 local
backups/storage compression gzip-9 local
backups/storage atime on default
backups/storage devices on default
backups/storage exec off local
backups/storage setuid on default
backups/storage readonly off default
backups/storage jailed off default
backups/storage snapdir hidden default
backups/storage aclmode discard default
backups/storage aclinherit restricted default
backups/storage canmount on default
backups/storage xattr off temporary
backups/storage copies 1 default
backups/storage version 5 -
backups/storage utf8only off -
backups/storage normalization none -
backups/storage casesensitivity sensitive -
backups/storage vscan off default
backups/storage nbmand off default
backups/storage sharesmb off default
backups/storage refquota none default
backups/storage refreservation none default
backups/storage primarycache all default
backups/storage secondarycache all default
backups/storage usedbysnapshots 0 -
backups/storage usedbydataset 4.26T -
backups/storage usedbychildren 0 -
backups/storage usedbyrefreservation 0 -
backups/storage logbias latency default
backups/storage dedup off default
backups/storage mlslabel -
backups/storage sync standard default
backups/storage refcompressratio 1.51x -
=====
I know ZFS does not have a fsck utility ("because it doesn't need one":), but does anyone know of any way of fixing this corruption short of destroying the pool, creating a new one, and restoring from backup? Is there some way of exporting and re-importing the pool that has the side-effect of doing some kind of fsck-like repairing of subtle corruption like this?
Cheers,
Paul.
More information about the freebsd-current
mailing list