Strange ZFS object sa_magic corrupted leading to kernel panic.

pierre Fevre pfevre at secuserve.com
Fri Nov 24 17:20:19 UTC 2017


Hi all,

I'm having an unpredictable kernel panic on a Freebsd 10.4 server with a 
2.6To Dataset.
I managed to get a screenshot of the kernel Panic: Basically I got this:

    panic: solaris assert: sa.sa_magic == (0x584530ab == 0x2f505a), file zfs_vfsops.c, line 610
    KBD: stack backtrace:
    assfail3+0x2f
    zfs_space_delta_cb+0x107
    dmu_objset_userquota_get_ids+0x36f

First of all here are the dataset property:

PROPERTY              VALUE                           SOURCE

type                  filesystem                      -
creation              Tue Sep 26 17:30 2017           -
used                  4.25T                           -
available             2.14T                           -
referenced            3.02T                           -
compressratio         1.17x                           -
mounted               yes                             -
quota                 none                            default
reservation           none                            default
recordsize            128K                            default
mountpoint            /mnt			      local
sharenfs              off                             default
checksum              on                              default
compression           on                              inherited from zdata
atime                 off                             inherited from zdata
devices               on                              default
exec                  on                              default
setuid                on                              default
readonly              off                             default
jailed                off                             default
snapdir               hidden                          default
aclmode               discard                         default
aclinherit            restricted                      default
canmount              on                              default
xattr                 off                             temporary
copies                1                               default
version               5                               -
utf8only              off                             -
normalization         none                            -
casesensitivity       sensitive                       -
vscan                 off                             default
nbmand                off                             default
sharesmb              off                             default
refquota              none                            default
refreservation        none                            default
primarycache          all                             default
secondarycache        all                             default
usedbysnapshots       1.23T                           -
usedbydataset         3.02T                           -
usedbychildren        1.60G                           -
usedbyrefreservation  0                               -
logbias               latency                         default
dedup                 off                             default
mlslabel                                              -
sync                  standard                        default
refcompressratio      1.15x                           -
written               0                               -
logicalused           4.91T                           -
logicalreferenced     3.49T                           -
volmode               default                         default
filesystem_limit      none                            default
snapshot_limit        none                            default
filesystem_count      none                            default
snapshot_count        none                            default
redundant_metadata    most                            local

I've started some investigation with ZDB and I found some specific 
object corrupted:

     1°) First test: *zdb -vvbcc -e -AAA -LG -d NAS/Backup/dataset 
*after a long listing it ends with zdb 100% cpu loop with:

...
   37236412    1   128K  3.00K     8K  3.00K  100.00  ZFS plain file
   37236422    1   128K  5.00K     8K  5.00K  100.00  ZFS plain file
   37236552    1   128K  37.0K    16K  37.0K  100.00  ZFS plain file
   37236656    1   128K  12.5K    16K  12.5K  100.00  ZFS plain file
   37236660    1   128K  12.5K    16K  12.5K  100.00  ZFS plain file
   37236668    1   128K  12.5K    16K  12.5K  100.00  ZFS plain file


ZFS_DBGMSG(zdb):
spa=NAS async request task2
^C

     2°) Then on that specific object with *zdb -vvbcc -LG -AAA -e -dddd 
NAS/Backup/dataset 37236668*

Dataset NAS/Backup/dataset [ZPL], ID 385164, cr_txg 182709, 2.81T, 9596843 objects, rootbp DVA[0]=<1:190a69bf0000:3000> DVA[1]=<0:6f571c41000:3000> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size€0L/200P birth375667L/3375667P fill•96843 cksumd9ece7b8:69145d8bcf8:1447eb6a46063:2b83132cd77e1a

     Object  lvl   iblk   dblk  dsize  lsize   %full  type
   37236668    1   128K  12.5K    16K  12.5K  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                         264   bonus  ZFS znode
         dnode flags: USED_BYTES USERUSED_ACCOUNTED
         dnode maxblkid: 0
         path    /mailbox/account/INBOX.mdir/101610-S_______-20170930142954-288
         uid     1002
         gid     1002
         atime   Sat Sep 30 16:29:54 2017
         mtime   Sat Sep 30 16:29:54 2017
         ctime   Sun Oct  1 10:33:16 2017
         crtime  Sat Sep 30 16:29:54 2017
         gen     12273136
         mode    100660
         size    12708
         parent  19189512
         links   1
         pflags  40800000004
         xattr   0
         rdev    0x0000000000000000
Indirect blocks:
                0 L0 1:1c49fe739000:6000 3200L/1a00P F=1 B!59816/2159816

                 segment [0000000000000000, 0000000000003200) size 12.5K


ZFS_DBGMSG(zdb):
spa=NAS async request task2

     3°) I finally removed the bogus object 37236668 But still I've got 
the same error on *zdb -vvbcc -e -AAA -LG -d NAS/Backup/dataset*

   37236660    1   128K  12.5K    16K  12.5K  100.00  ZFS plain file
   37236668    1   128K  12.5K    16K  12.5K  100.00  ZFS plain file


ZFS_DBGMSG(zdb):
spa=NAS async request task2

So my final thought is that the problem is coming from the freespace 
map. I'm actually investigating with *zdb -vvvmm -AAA -LG zdata *it ends 
also with endless zdb loop at 100 %cpu with:

...
             [    53]    F  range: 027ae20000-027be20000  size: 1000000
             [    54]    F  range: 027be20000-027bfff000  size: 1df000
             [    55] FREE: txg 12262024, pass 1
             [    56]    F  range: 0278e00000-0278e20000  size: 020000
             [    57] FREE: txg 12262025, pass 1
             [    58]    F  range: 0278340000-0278360000  size: 020000


ZFS_DBGMSG(zdb):
^C

I'm starting to get out of idea, and I don't even know if my 
investigations are pointing to something....

This dataset is an Old dataset created in zfs v3 and upgraded in v5 and 
has been send and received several time for the testing purpose.


Thanks for any lead on a solution. Otherwhise I'm good for 2.6To rsync 
and a huge service downtime.


Regards,

Pierre

PIERRE FèVRE

SECUSERVE LYON

  
    Retrouvez nos nouveautés sur notre     BLOG Messaging as a Service     
  


More information about the freebsd-fs mailing list