ZFS Kernel Panic on 10.0-RELEASE
Mike Carlson
mike at bayphoto.com
Mon Jun 2 15:49:09 UTC 2014
On 6/2/2014 2:12 AM, Steven Hartland wrote:
> ----- Original Message ----- From: "Mike Carlson" <mike at bayphoto.com>
>
>> On 5/30/2014 1:10 PM, Mike Carlson wrote:
>> > On 5/30/2014 12:48 PM, Jordan Hubbard wrote:
>> >> On May 30, 2014, at 12:04 PM, Mike Carlson <mike at bayphoto.com> wrote:
>> >>
>> >>> Over the weekend, we had upgraded one of our servers from
>> 9.1-RELEASE to 10.0-RELEASE, and then the zpool was upgraded (from
>> >>> 28 to 5000)
>> >>>
>> >>> Tuesday afternoon, the server suddenly rebooted (kernel panic),
>> and as soon as it tried to remount all of its ZFS volumes, >>> it
>> panic'd again.
>> >> What’s the panic text? That’s pretty crucial in figuring out
>> whether this is recoverable (e.g. if it’s spacemap corruption >>
>> related, probably not).
>> >>
>> >> - Jordan
>> >>
>> >>
>> >>
>> > I had linked the pictures I took of the console, but here is my
>> manual reproduction:
>> >
>> > Fatal trap 12: page fault while in kernel mode
>> > cpuid = 7; apic id = 07
>> > fault virtual address = 0x4a0
>> > fault code = supervisor read data, page not present
>> > instruction pointer = 0x20:0xffffffff81a7f39f
>> > stack pointer = 0x28:0xfffffe1834789570
>> > frame pointer = 0x28:0xfffffe18347895b0
>> > code segment = base 0x0, limit 0xfffff, type 0x1b
>> > = DPL 0, pres 1, long 1, def32 0, gran 1
>> > processor eflags = interrupt enabled, resume, IOPL = 0
>> > current process = 1849 (txg_thread_enter)
>> > trap number = 12
>> > panic: page fault
>> > cpuid = 7
>> > KDB: stack backtrace:
>> > #0 0xffffffff808e7dd0 at kdb_backtrace+0x60
>> > #1 0xffffffff808af8b5 at panic+0x155
>> > #2 0xffffffff80c8e629 at trap_fatal+0x3a2
>> > #3 0xffffffff80c8e969 at trap_pfault+0x2c9
>> > #4 0xffffffff80c8e0f6 at trap+0x5e6
>> > #5 0xffffffff80c75392 at calltrap+0x8
>> > #6 0xffffffff81a53b5a at dsl_dataset_block_kill+0x3a
>> > #7 0xffffffff81a50967 at dnode_sync+0x237
>> > #8 0xffffffff81a48fcb at dmu_objset_sync_dnodes+0x2b
>> > #9 0xffffffff81a48e4d at dmo_objset_sync+0x1ed
>> > #10 0xffffffff81a5d29a at dsl_pool_sync+0xca
>> > #11 0xffffffff81a78a4e at spa_sync+0x52e
>> > #12 0xffffffff81a81925 at txg_sync_thread+0x375
>> > #13 0xffffffff8088198a at fork_exit+0x9a
>> > #14 0xffffffff80c758ce at fork_trampoline+0xe
>> > uptime: 46s
>> > Automatic reboot in 15 seconds - press a key on the console to
>> abort
>> >
>> This just happened again to another server. We upgraded two servers
>> on the same morning, and now both of them exhibit this corrupted zfs
>> volume and panic behavior.
>>
>> Out of all the volumes, one of them is causing the panic, and the
>> panic message is nearly identical.
>>
>> I have 4 snapshots over the last 24 hours, so hopefully a snapshot
>> from noon today can be sent to a new volume ( zfs send | zfs recv )
>>
>> I guess I can now rule out it being a hardware issue, this is clearly
>> problem related to the upgrade (freebsd-update was used). I first
>> thought the first system had a bad upgrade, perhaps a mix and match
>> of 9.2 binaries running on a 10 kernel, but I used the
>> 'freebsd-update IDS' command to verify the integrity of the install,
>> and it looked good, the only differences were config files in /etc/
>> that we manage.
>>
>
> Do you have a kernel crash dump from this?
>
> Also can you confirm if your amd64 or just i386?
>
> Regards
> Steve
>
>
I dont have a crash dump, and this is on amd64
I might be able to get a crash dump on one of them, the other is back up
and running. It is a little challenging because the system I can do this
on has zfs on root, but I have a spare drive I can use as the swap volume.
Mike C
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6054 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20140602/27fbd078/attachment.bin>
More information about the freebsd-fs
mailing list