ZFS Kernel Panic on 10.0-RELEASE

Steven Hartland killing at multiplay.co.uk
Wed Jun 4 19:23:18 UTC 2014


You mention mfi and 9.1, which rings alarm bells.

They shouldn't be, but if your drives are > 2^32 sectors you'll
have corruption:
http://svnweb.freebsd.org/base?view=revision&revision=242497

In addition to this I did a large number of fixes to mfi after
this point which could result in all sorts of issues, but that
doesn't explain issues with mps.

Upgrading shouldn't have removed the cache file so I'm guessing
that your initial install was already missing this.

zdb is picky about havin a cache file, which is something we
should fix at some point as IIRC the changes avg or mav made,
I can't remember which, means that FreeBSD doesn't rely on 
the cache file being present as much as it did.

Back to the corruption, unfortunately this could be any number
of things so its almost impossible to tell at which point the
issue originally occured :(

It might well be worth emailing a summary of the issue to the
openzfs mailing list see if someone on there has any ideas
where the DVA corruption could have occured.

    Regards
    Steve

----- Original Message ----- 
From: "Mike Carlson" <mike at bayphoto.com>
To: <freebsd-fs at freebsd.org>
Sent: Wednesday, June 04, 2014 7:46 PM
Subject: Re: ZFS Kernel Panic on 10.0-RELEASE


Top-posting... sorry

I'm going to have to roll this particular server back into production, 
so I'll be rebuilding it from scratch

That is okay with this particular system, the other server that 
exhibited the same issue will have to have all 19TB of its usable data 
streamed off to temp storage (if we can get it) and rebuilt as well.

Thank you Steve for being so helpful, and patient with me stumbling 
through kgdb :)


I have some lingering questions about the entire situation:

First, these servers perform regular zpool scrubs (once a month), and 
have ECC memory. According the the additional logging information I was 
able to get from Steve's patch, it seems that even with these safeguards 
data was still corrupted. A scub after the initial panic did not report 
any errors.

Second, these two servers had an extra anomaly, and that was the missing 
zpool.cache. I say missing, because zdb was unable to access the zpool, 
it was not until I ran "zpool set cachefile=/boot/zfs/zpool.cache 
<pool>". This was previously not an issue.

The two servers were upgraded fro 9.1 to 10 on the same morning, within 
minutes of each other. That is about it as far as commonalities. Both 
have different drive types (900GB SAS vs 2TB SATA), different 
controllers (Dell PERC (mfi) vs LSI (mps)), Dell vs SuperMicro boards...

We do use the aio kernel module, and as well as some sysctl and 
loader.conf tuning. I've backed all of those out, so we're just running 
a stock OS.

Ideally, I would like to never run into this situation again. However, I 
don't have any evidence to point to an upgrade misstep or some 
catastrophic configuration error (kernel parameters, zpool create).


Thank everyone,
Mike C

On 6/3/2014 8:57 AM, Mike Carlson wrote:
> On 6/3/2014 2:24 AM, Steven Hartland wrote:
>>
>> ----- Original Message ----- From: "Mike Carlson" <mike at bayphoto.com>
>>
>>> Scratch that last one, the cachefile had to be reset on the pool to 
>>> /boot/zfs/zpool.cache
>>>
>>> So I'm running it now, and its taking so long to traverse all 
>>> blocks, that it is telling me its going to take around 5400 HOURS
>>>
>>> I guess I'll report back 90 days?
>>
>> Try with just the following should be quicker:
>> zdb -uuuC zroot
>>
>>    Regards
>>    Steve
>>
>
> zdb -uuumcD eventually segfaulted:
>
>    Uberblock:
>             magic = 0000000000bab10c
>             version = 5000
>             txg = 3378596
>             guid_sum = 1996697515446579069
>             timestamp = 1401756315 UTC = Mon Jun  2 17:45:15 2014
>             rootbp = DVA[0]=<0:3f08b7fd000:600>
>    DVA[1]=<0:5500f66f000:600> DVA[2]=<0:86001fb9c00:600> [L0 DMU
>    objset] fletcher4 lzjb LE contiguous unique triple size=800L/200P
>    birth=3378596L/3378596P fill=326
>    cksum=10553d553d:65de1705c49:1445be46ea217:2bed6cb4bc5e02
>
>    All DDTs are empty
>
>    Metaslabs:
>             vdev          0
>             metaslabs   143   offset spacemap          free
>             ---------------   ------------------- ---------------    
> -------------
>             metaslab      0   offset            0 spacemap     34    
> free    12.8G
>             metaslab      1   offset   1000000000 spacemap    162    
> free    21.0G
>             metaslab      2   offset   2000000000 spacemap    170    
> free    4.20G
>             metaslab      3   offset   3000000000 spacemap    182    
> free    26.8G
>             metaslab      4   offset   4000000000 spacemap    183    
> free    18.7G
>             metaslab      5   offset   5000000000 spacemap    184    
> free    27.9G
>             metaslab      6   offset   6000000000 spacemap    185    
> free    19.9G
>             metaslab      7   offset   7000000000 spacemap    187    
> free    30.8G
>             metaslab      8   offset   8000000000 spacemap    188    
> free    24.4G
>             metaslab      9   offset   9000000000 spacemap    189    
> free    2.73G
>             metaslab     10   offset   a000000000 spacemap    190    
> free    17.4G
>             metaslab     11   offset   b000000000 spacemap    193    
> free    20.5G
>             metaslab     12   offset   c000000000 spacemap    194    
> free    10.0G
>             metaslab     13   offset   d000000000 spacemap    195    
> free    15.0G
>             metaslab     14   offset   e000000000 spacemap    196    
> free    19.8G
>             metaslab     15   offset   f000000000 spacemap    197    
> free    22.6G
>             metaslab     16   offset  10000000000 spacemap    198    
> free    11.8G
>             metaslab     17   offset  11000000000 spacemap    199    
> free    18.3G
>             metaslab     18   offset  12000000000 spacemap    200    
> free    3.35G
>             metaslab     19   offset  13000000000 spacemap    201    
> free    24.2G
>             metaslab     20   offset  14000000000 spacemap    202    
> free     9.8G
>             metaslab     21   offset  15000000000 spacemap    205    
> free    16.1G
>             metaslab     22   offset  16000000000 spacemap    206    
> free    31.4G
>             metaslab     23   offset  17000000000 spacemap    207    
> free    10.6G
>             metaslab     24   offset  18000000000 spacemap    208    
> free    29.9G
>             metaslab     25   offset  19000000000 spacemap    209    
> free    13.0G
>             metaslab     26   offset  1a000000000 spacemap    210    
> free    15.2G
>             metaslab     27   offset  1b000000000 spacemap     33    
> free    35.3G
>             metaslab     28   offset  1c000000000 spacemap    186    
> free    3.40G
>             metaslab     29   offset  1d000000000 spacemap    211    
> free    17.9G
>             metaslab     30   offset  1e000000000 spacemap    212    
> free    11.2G
>             metaslab     31   offset  1f000000000 spacemap    213    
> free    7.69G
>             metaslab     32   offset  20000000000 spacemap    214    
> free    21.2G
>             metaslab     33   offset  21000000000 spacemap    215    
> free    7.66G
>             metaslab     34   offset  22000000000 spacemap    216    
> free    15.6G
>             metaslab     35   offset  23000000000 spacemap    217    
> free    28.2G
>             metaslab     36   offset  24000000000 spacemap    218    
> free    20.8G
>             metaslab     37   offset  25000000000 spacemap    221    
> free    14.5G
>             metaslab     38   offset  26000000000 spacemap    192    
> free    14.1G
>             metaslab     39   offset  27000000000 spacemap    222    
> free    23.5G
>             metaslab     40   offset  28000000000 spacemap    223    
> free    22.8G
>             metaslab     41   offset  29000000000 spacemap    224    
> free    16.2G
>             metaslab     42   offset  2a000000000 spacemap    225    
> free    16.7G
>             metaslab     43   offset  2b000000000 spacemap    226    
> free    18.3G
>             metaslab     44   offset  2c000000000 spacemap    227    
> free    3.63G
>             metaslab     45   offset  2d000000000 spacemap    228    
> free    6.13G
>             metaslab     46   offset  2e000000000 spacemap    229    
> free    22.8G
>             metaslab     47   offset  2f000000000 spacemap    230    
> free    31.2G
>             metaslab     48   offset  30000000000 spacemap    204    
> free    5.64G
>             metaslab     49   offset  31000000000 spacemap    232    
> free    4.14G
>             metaslab     50   offset  32000000000 spacemap    233    
> free    22.0G
>             metaslab     51   offset  33000000000 spacemap    234    
> free    21.1G
>             metaslab     52   offset  34000000000 spacemap    235    
> free    10.9G
>             metaslab     53   offset  35000000000 spacemap    236    
> free    28.6G
>             metaslab     54   offset  36000000000 spacemap     32    
> free    24.2G
>             metaslab     55   offset  37000000000 spacemap    237    
> free    6.30G
>             metaslab     56   offset  38000000000 spacemap    238    
> free    22.6G
>             metaslab     57   offset  39000000000 spacemap    239    
> free    12.9G
>             metaslab     58   offset  3a000000000 spacemap    242    
> free    22.8G
>             metaslab     59   offset  3b000000000 spacemap    243    
> free    22.0G
>             metaslab     60   offset  3c000000000 spacemap    244    
> free    26.4G
>             metaslab     61   offset  3d000000000 spacemap    245    
> free     9.6G
>             metaslab     62   offset  3e000000000 spacemap    246    
> free    22.1G
>             metaslab     63   offset  3f000000000 spacemap    247    
> free    59.1G
>             metaslab     64   offset  40000000000 spacemap    220    
> free    61.8G
>             metaslab     65   offset  41000000000 spacemap    191    
> free    17.7G
>             metaslab     66   offset  42000000000 spacemap    248    
> free    13.1G
>             metaslab     67   offset  43000000000 spacemap    249    
> free    22.5G
>             metaslab     68   offset  44000000000 spacemap    250    
> free    4.39G
>             metaslab     69   offset  45000000000 spacemap    251    
> free    16.2G
>             metaslab     70   offset  46000000000 spacemap    252    
> free    3.88G
>             metaslab     71   offset  47000000000 spacemap    253    
> free    8.96G
>             metaslab     72   offset  48000000000 spacemap    254    
> free    25.2G
>             metaslab     73   offset  49000000000 spacemap    255    
> free    15.2G
>             metaslab     74   offset  4a000000000 spacemap    257    
> free    26.1G
>             metaslab     75   offset  4b000000000 spacemap    203    
> free    5.36G
>             metaslab     76   offset  4c000000000 spacemap    258    
> free    59.4G
>             metaslab     77   offset  4d000000000 spacemap    259    
> free    15.9G
>             metaslab     78   offset  4e000000000 spacemap    260    
> free    62.1G
>             metaslab     79   offset  4f000000000 spacemap    261    
> free    19.4G
>             metaslab     80   offset  50000000000 spacemap    262    
> free    4.07G
>             metaslab     81   offset  51000000000 spacemap    263    
> free    31.0G
>             metaslab     82   offset  52000000000 spacemap    264    
> free    32.1G
>             metaslab     83   offset  53000000000 spacemap    265    
> free    21.9G
>             metaslab     84   offset  54000000000 spacemap    266    
> free    26.2G
>             metaslab     85   offset  55000000000 spacemap    241    
> free    58.9G
>             metaslab     86   offset  56000000000 spacemap    267    
> free    22.3G
>             metaslab     87   offset  57000000000 spacemap    268    
> free    8.49G
>             metaslab     88   offset  58000000000 spacemap    269    
> free    17.5G
>             metaslab     89   offset  59000000000 spacemap    270    
> free    24.2G
>             metaslab     90   offset  5a000000000 spacemap    271    
> free    6.78G
>             metaslab     91   offset  5b000000000 spacemap    219    
> free    12.7G
>             metaslab     92   offset  5c000000000 spacemap    274    
> free    27.4G
>             metaslab     93   offset  5d000000000 spacemap    275    
> free    21.5G
>             metaslab     94   offset  5e000000000 spacemap    276    
> free    25.2G
>             metaslab     95   offset  5f000000000 spacemap    277    
> free    27.8G
>             metaslab     96   offset  60000000000 spacemap    278    
> free    6.67G
>             metaslab     97   offset  61000000000 spacemap    279    
> free    26.3G
>             metaslab     98   offset  62000000000 spacemap    280    
> free    12.0G
>             metaslab     99   offset  63000000000 spacemap    281    
> free    18.1G
>             metaslab    100   offset  64000000000 spacemap    282    
> free    23.3G
>             metaslab    101   offset  65000000000 spacemap    256    
> free    25.0G
>             metaslab    102   offset  66000000000 spacemap    231    
> free    16.8G
>             metaslab    103   offset  67000000000 spacemap    284    
> free    16.2G
>             metaslab    104   offset  68000000000 spacemap    285    
> free    20.0G
>             metaslab    105   offset  69000000000 spacemap    286    
> free    30.6G
>             metaslab    106   offset  6a000000000 spacemap    287    
> free    24.5G
>             metaslab    107   offset  6b000000000 spacemap    288    
> free    19.6G
>             metaslab    108   offset  6c000000000 spacemap    289    
> free    16.8G
>             metaslab    109   offset  6d000000000 spacemap    290    
> free    22.7G
>             metaslab    110   offset  6e000000000 spacemap    291    
> free    22.0G
>             metaslab    111   offset  6f000000000 spacemap    292    
> free    16.6G
>             metaslab    112   offset  70000000000 spacemap    240    
> free    14.8G
>             metaslab    113   offset  71000000000 spacemap    293    
> free    20.9G
>             metaslab    114   offset  72000000000 spacemap    294    
> free    53.7G
>             metaslab    115   offset  73000000000 spacemap    295    
> free    17.9G
>             metaslab    116   offset  74000000000 spacemap    296    
> free    19.1G
>             metaslab    117   offset  75000000000 spacemap    297    
> free    32.7G
>             metaslab    118   offset  76000000000 spacemap    298    
> free    17.8G
>             metaslab    119   offset  77000000000 spacemap    273    
> free    55.0G
>             metaslab    120   offset  78000000000 spacemap    299    
> free    20.7G
>             metaslab    121   offset  79000000000 spacemap    300    
> free    16.8G
>             metaslab    122   offset  7a000000000 spacemap    301    
> free    16.8G
>             metaslab    123   offset  7b000000000 spacemap    302    
> free    22.7G
>             metaslab    124   offset  7c000000000 spacemap    303    
> free    14.8G
>             metaslab    125   offset  7d000000000 spacemap    304    
> free    22.1G
>             metaslab    126   offset  7e000000000 spacemap    305    
> free    15.3G
>             metaslab    127   offset  7f000000000 spacemap    306    
> free    17.1G
>             metaslab    128   offset  80000000000 spacemap    307    
> free    20.2G
>             metaslab    129   offset  81000000000 spacemap    283    
> free    58.2G
>             metaslab    130   offset  82000000000 spacemap    308    
> free    24.5G
>             metaslab    131   offset  83000000000 spacemap    309    
> free    4.19G
>             metaslab    132   offset  84000000000 spacemap    310    
> free    15.0G
>             metaslab    133   offset  85000000000 spacemap    311    
> free    19.9G
>             metaslab    134   offset  86000000000 spacemap    312    
> free    60.9G
>             metaslab    135   offset  87000000000 spacemap    313    
> free    60.6G
>             metaslab    136   offset  88000000000 spacemap    314    
> free    60.9G
>             metaslab    137   offset  89000000000 spacemap    315    
> free    59.8G
>             metaslab    138   offset  8a000000000 spacemap    316    
> free    60.9G
>             metaslab    139   offset  8b000000000 spacemap    272    
> free    61.6G
>             metaslab    140   offset  8c000000000 spacemap    317    
> free    62.4G
>             metaslab    141   offset  8d000000000 spacemap    318    
> free    61.2G
>             metaslab    142   offset  8e000000000 spacemap    319    
> free    61.5G
>
>
>    Traversing all blocks to verify metadata checksums and verify
>    nothing leaked ...
>
>    load: 1.59  cmd: zdb 54160 [physrd] 31.13r 3.05u 1.15s 4% 142544k
>    load: 0.45  cmd: zdb 54160 [physrd] 105.37r 6.69u 2.33s 4% 263428k
>    5.64T completed ( 119MB/s) estimated time remaining: 0hr 12min
>    55sec        Assertion failed: (bp->blk_pad[0] == 0), file
> /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c,
>    line 2978.
>    Abort (core dumped)
>
> The second command you suggested returned:
>
>    # zdb -uuuC zroot
>
>    MOS Configuration:
>             version: 5000
>             name: 'zroot'
>             state: 0
>             txg: 3377279
>             pool_guid: 9132288035431788388
>             hostid: 2783470193
>             hostname: 'working-1.discdrive.bayphoto.com'
>             vdev_children: 1
>             vdev_tree:
>                 type: 'root'
>                 id: 0
>                 guid: 9132288035431788388
>                 children[0]:
>                     type: 'raidz'
>                     id: 0
>                     guid: 15520162542638044402
>                     nparity: 2
>                     metaslab_array: 31
>                     metaslab_shift: 36
>                     ashift: 9
>                     asize: 9894744555520
>                     is_log: 0
>                     create_txg: 4
>                     children[0]:
>                         type: 'disk'
>                         id: 0
>                         guid: 4289437176706222104
>                         path: '/dev/mfid0p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02310703f4b791/b'
>                         phys_path: '/dev/mfid0p2'
>                         whole_disk: 1
>                         DTL: 181
>                         create_txg: 4
>                     children[1]:
>                         type: 'disk'
>                         id: 1
>                         guid: 5369387862706621015
>                         path: '/dev/mfid1p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02311604ce1965/b'
>                         phys_path: '/dev/mfid1p2'
>                         whole_disk: 1
>                         DTL: 180
>                         create_txg: 4
>                     children[2]:
>                         type: 'disk'
>                         id: 2
>                         guid: 456749962069636782
>                         path: '/dev/mfid2p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02312105778eef/b'
>                         phys_path: '/dev/mfid2p2'
>                         whole_disk: 1
>                         DTL: 179
>                         create_txg: 4
>                     children[3]:
>                         type: 'disk'
>                         id: 3
>                         guid: 3809413300177228462
>                         path: '/dev/mfid3p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02312905f430b5/b'
>                         phys_path: '/dev/mfid3p2'
>                         whole_disk: 1
>                         DTL: 178
>                         create_txg: 4
>                     children[4]:
>                         type: 'disk'
>                         id: 4
>                         guid: 4978694931676882497
>                         path: '/dev/mfid4p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02313606b73c4a/b'
>                         phys_path: '/dev/mfid4p2'
>                         whole_disk: 1
>                         DTL: 177
>                         create_txg: 4
>                     children[5]:
>                         type: 'disk'
>                         id: 5
>                         guid: 17831739822150458220
>                         path: '/dev/mfid5p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a023142077914f5/b'
>                         phys_path: '/dev/mfid5p2'
>                         whole_disk: 1
>                         DTL: 176
>                         create_txg: 4
>                     children[6]:
>                         type: 'disk'
>                         id: 6
>                         guid: 1286918567594965543
>                         path: '/dev/mfid6p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02314c080cb066/b'
>                         phys_path: '/dev/mfid6p2'
>                         whole_disk: 1
>                         DTL: 175
>                         create_txg: 4
>                     children[7]:
>                         type: 'disk'
>                         id: 7
>                         guid: 7958718879588658810
>                         path: '/dev/mfid7p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02315608a7f0a2/b'
>                         phys_path: '/dev/mfid7p2'
>                         whole_disk: 1
>                         DTL: 174
>                         create_txg: 4
>                     children[8]:
>                         type: 'disk'
>                         id: 8
>                         guid: 18392960683862755998
>                         path: '/dev/mfid8p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a023160093a9190/b'
>                         phys_path: '/dev/mfid8p2'
>                         whole_disk: 1
>                         DTL: 173
>                         create_txg: 4
>                     children[9]:
>                         type: 'disk'
>                         id: 9
>                         guid: 13046629036569375198
>                         path: '/dev/mfid9p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0f13870001a02316909c8894c/b'
>                         phys_path: '/dev/mfid9p2'
>                         whole_disk: 1
>                         DTL: 172
>                         create_txg: 4
>                     children[10]:
>                         type: 'disk'
>                         id: 10
>                         guid: 10604061156531251346
>                         path: '/dev/mfid11p2'
>                         devid: 
> 'id1,sd at n6b8ca3a0ef7a7a0019cc18e30bbfa11e/b'
>                         phys_path: '/dev/mfid11p2'
>                         whole_disk: 1
>                         DTL: 171
>                         create_txg: 4
>             features_for_read:
>
>    Uberblock:
>             magic = 0000000000bab10c
>             version = 5000
>             txg = 3389469
>             guid_sum = 1996697515446579069
>             timestamp = 1401810802 UTC = Tue Jun  3 08:53:22 2014
>             rootbp = DVA[0]=<0:3f0bf445c00:c00>
>    DVA[1]=<0:55027e77200:c00> DVA[2]=<0:86003a4c400:c00> [L0 DMU
>    objset] fletcher4 uncompressed LE contiguous unique triple
>    size=800L/800P birth=3389469L/3389469P fill=326
>    cksum=389487e40:6aa058451f9:64bbaf298ba16:3f9bfc58017be5d
>
> Any reason why I would have to manually re-import the cache file? I 
> had performed that task during the initial install (this was before 
> bsdinstall had a zfs on root option, so it was done manually, where 
> you have to export the cachefile, then at the end of the install cp it 
> to /boot/zfs/zpool.cache and re-import it)
>
> Mike C


More information about the freebsd-fs mailing list