Strange ZFS object sa_magic corrupted leading to kernel panic.
pierre Fevre
pfevre at secuserve.com
Fri Nov 24 17:20:19 UTC 2017
Hi all,
I'm having an unpredictable kernel panic on a Freebsd 10.4 server with a
2.6To Dataset.
I managed to get a screenshot of the kernel Panic: Basically I got this:
panic: solaris assert: sa.sa_magic == (0x584530ab == 0x2f505a), file zfs_vfsops.c, line 610
KBD: stack backtrace:
assfail3+0x2f
zfs_space_delta_cb+0x107
dmu_objset_userquota_get_ids+0x36f
First of all here are the dataset property:
PROPERTYÂ Â Â Â Â Â Â Â Â Â Â Â Â VALUEÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â SOURCE
type                 filesystem                     -
creation             Tue Sep 26 17:30 2017          -
used                 4.25T                          -
available            2.14T                          -
referenced           3.02T                          -
compressratio        1.17x                          -
mounted              yes                            -
quota                none                           default
reservation          none                           default
recordsize           128K                           default
mountpoint           /mnt local
sharenfs             off                            default
checksum             on                             default
compression          on                             inherited from zdata
atime                off                            inherited from zdata
devices              on                             default
exec                 on                             default
setuid               on                             default
readonly             off                            default
jailed               off                            default
snapdir              hidden                         default
aclmode              discard                        default
aclinherit           restricted                     default
canmount             on                             default
xattr                off                            temporary
copies               1                              default
version              5                              -
utf8only             off                            -
normalization        none                           -
casesensitivity      sensitive                      -
vscan                off                            default
nbmand               off                            default
sharesmb             off                            default
refquota             none                           default
refreservation       none                           default
primarycache         all                            default
secondarycache       all                            default
usedbysnapshots      1.23T                          -
usedbydataset        3.02T                          -
usedbychildren       1.60G                          -
usedbyrefreservation 0                              -
logbias              latency                        default
dedup                off                            default
mlslabel                                             -
sync                 standard                       default
refcompressratio     1.15x                          -
written              0                              -
logicalused          4.91T                          -
logicalreferenced    3.49T                          -
volmode              default                        default
filesystem_limit     none                           default
snapshot_limit       none                           default
filesystem_count     none                           default
snapshot_count       none                           default
redundant_metadata   most                           local
I've started some investigation with ZDB and I found some specific
object corrupted:
   1°) First test: *zdb -vvbcc -e -AAA -LG -d NAS/Backup/dataset
*after a long listing it ends with zdb 100% cpu loop with:
...
37236412 1 128K 3.00K 8K 3.00K 100.00 ZFS plain file
37236422 1 128K 5.00K 8K 5.00K 100.00 ZFS plain file
37236552 1 128K 37.0K 16K 37.0K 100.00 ZFS plain file
37236656 1 128K 12.5K 16K 12.5K 100.00 ZFS plain file
37236660 1 128K 12.5K 16K 12.5K 100.00 ZFS plain file
37236668 1 128K 12.5K 16K 12.5K 100.00 ZFS plain file
ZFS_DBGMSG(zdb):
spa=NAS async request task2
^C
   2°) Then on that specific object with *zdb -vvbcc -LG -AAA -e -dddd
NAS/Backup/dataset 37236668*
Dataset NAS/Backup/dataset [ZPL], ID 385164, cr_txg 182709, 2.81T, 9596843 objects, rootbp DVA[0]=<1:190a69bf0000:3000> DVA[1]=<0:6f571c41000:3000> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size0L/200P birth375667L/3375667P fill96843 cksumd9ece7b8:69145d8bcf8:1447eb6a46063:2b83132cd77e1a
Object lvl iblk dblk dsize lsize %full type
37236668 1 128K 12.5K 16K 12.5K 100.00 ZFS plain file (K=inherit) (Z=inherit)
264 bonus ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /mailbox/account/INBOX.mdir/101610-S_______-20170930142954-288
uid 1002
gid 1002
atime Sat Sep 30 16:29:54 2017
mtime Sat Sep 30 16:29:54 2017
ctime Sun Oct 1 10:33:16 2017
crtime Sat Sep 30 16:29:54 2017
gen 12273136
mode 100660
size 12708
parent 19189512
links 1
pflags 40800000004
xattr 0
rdev 0x0000000000000000
Indirect blocks:
0 L0 1:1c49fe739000:6000 3200L/1a00P F=1 B!59816/2159816
segment [0000000000000000, 0000000000003200) size 12.5K
ZFS_DBGMSG(zdb):
spa=NAS async request task2
   3°) I finally removed the bogus object 37236668 But still I've got
the same error on *zdb -vvbcc -e -AAA -LG -d NAS/Backup/dataset*
37236660 1 128K 12.5K 16K 12.5K 100.00 ZFS plain file
37236668 1 128K 12.5K 16K 12.5K 100.00 ZFS plain file
ZFS_DBGMSG(zdb):
spa=NAS async request task2
So my final thought is that the problem is coming from the freespace
map. I'm actually investigating with *zdb -vvvmm -AAA -LG zdata *it ends
also with endless zdb loop at 100 %cpu with:
...
[ 53] F range: 027ae20000-027be20000 size: 1000000
[ 54] F range: 027be20000-027bfff000 size: 1df000
[ 55] FREE: txg 12262024, pass 1
[ 56] F range: 0278e00000-0278e20000 size: 020000
[ 57] FREE: txg 12262025, pass 1
[ 58] F range: 0278340000-0278360000 size: 020000
ZFS_DBGMSG(zdb):
^C
I'm starting to get out of idea, and I don't even know if my
investigations are pointing to something....
This dataset is an Old dataset created in zfs v3 and upgraded in v5 and
has been send and received several time for the testing purpose.
Thanks for any lead on a solution. Otherwhise I'm good for 2.6To rsync
and a huge service downtime.
Regards,
Pierre
PIERRE FèVRE
SECUSERVE LYON
Â
Retrouvez nos nouveautés sur notre BLOG Messaging as a Service
Â
More information about the freebsd-fs
mailing list