Re: kernel panic while copying files

From: Mateusz Guzik <mjguzik_at_gmail.com>
Date: Tue, 08 Jun 2021 09:04:33 UTC
Given how easy it is to reproduce perhaps you can spend a little bit
of time narrowing it down to a specific commit. You can do it with
git-bisect.

On 6/8/21, Gary Jennejohn <gljennjohn@gmail.com> wrote:
> On Mon, 7 Jun 2021 16:54:11 -0400
> Mark Johnston <markj@freebsd.org> wrote:
>
>> On Mon, Jun 07, 2021 at 11:01:09AM +0200, Gary Jennejohn wrote:
>> > I've seen this panic three times in the last two days:
>> >
>> > [first panic]
>> > Unread portion of the kernel message buffer:
>> >
>> >
>> > Fatal trap 12: page fault while in kernel mode
>> > cpuid = 3; apic id = 03
>> > fault virtual address   = 0x801118000
>> > fault code              = supervisor write data, page not present
>> > instruction pointer     = 0x20:0xffffffff808d2212
>> > stack pointer           = 0x28:0xfffffe00dbc8c760
>> > frame pointer           = 0x28:0xfffffe00dbc8c7a0
>> > code segment            = base 0x0, limit 0xfffff, type 0x1b
>> >                         = DPL 0, pres 1, long 1, def32 0, gran 1
>> > processor eflags        = interrupt enabled, resume, IOPL = 0
>> > current process         = 28 (dom0)
>> > trap number             = 12
>> > panic: page fault
>> > cpuid = 3
>> > time = 1622963058
>> > KDB: stack backtrace:
>> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>> > 0xfffffe00dbc8c410
>> > vpanic() at vpanic+0x181/frame 0xfffffe00dbc8c460
>> > panic() at panic+0x43/frame 0xfffffe00dbc8c4c0
>> > trap_fatal() at trap_fatal+0x387/frame 0xfffffe00dbc8c520
>> > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00dbc8c580
>> > trap() at trap+0x253/frame 0xfffffe00dbc8c690
>> > calltrap() at calltrap+0x8/frame 0xfffffe00dbc8c690
>> > --- trap 0xc, rip = 0xffffffff808d2212, rsp = 0xfffffe00dbc8c760, rbp =
>> > 0xfffffe00dbc8c7a0 ---
>> > zone_release() at zone_release+0x1f2/frame 0xfffffe00dbc8c7a0
>> > bucket_drain() at bucket_drain+0xda/frame 0xfffffe00dbc8c7d0
>> > bucket_cache_reclaim_domain() at bucket_cache_reclaim_domain+0x30a/frame
>> > 0xfffffe00dbc8c830
>> > zone_reclaim() at zone_reclaim+0x162/frame 0xfffffe00dbc8c880
>> > uma_reclaim_domain() at uma_reclaim_domain+0xa2/frame
>> > 0xfffffe00dbc8c8b0
>> > vm_pageout_worker() at vm_pageout_worker+0x41e/frame 0xfffffe00dbc8cb70
>> > vm_pageout() at vm_pageout+0x21e/frame 0xfffffe00dbc8cbb0
>> > fork_exit() at fork_exit+0x7e/frame 0xfffffe00dbc8cbf0
>> > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00dbc8cbf0
>> > --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>> > KDB: enter: panic
>> >
>> > __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
>> > 55       __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
>> > pcpu,
>> > 	   pc_curthread)));
>> >
>> > One difference was that in the second and third panics the fault
>> > virtual
>> > address was 0x0.  But the backtrace was the same.
>> >
>> > Relevant info from the info.x files:
>> > Architecture: amd64
>> > Architecture Version: 2
>> > Version String: FreeBSD 14.0-CURRENT #33 main-n247184-1970d693039: Sat
>> > Jun
>> > 5 09:58:55 CEST 2021
>> >
>> > CPU: AMD Ryzen 5 1600 Six-Core Processor             (3194.09-MHz
>> > K8-class CPU)
>> >   Origin="AuthenticAMD"  Id=0x800f11  Family=0x17  Model=0x1
>> > Stepping=1
>> >   AMD Features=0x2e500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM>
>> >   AMD
>> > Features2=0x35c233ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,SKINIT,WDT,TCE,Topology,PCXC,PNXC,DBE,PL2I,MWAITX>
>> >   AMD Extended Feature Extensions ID
>> > EBX=0x1007<CLZERO,IRPerf,XSaveErPtr,IBPB>
>> >
>> > I have 16GiB of memory in the box.
>> >
>> > The panic occurred while copying files from an internal SATA SSD to a
>> > SATA 8TB disk in an external USB3 docking station.  The panic seems to
>> > occur quite quickly, after only a few files have been copied.
>> >
>> > swap is on a different internal disk.
>> >
>> > I can poke around in the crash dumps with kgdb if anyone wants more
>> > information.
>>
>> Are you running with invariants configured in the kernel?  If not,
>> please try to reproduce this in a kernel with
>>
>> options INVARIANT_SUPPORT
>> options INVARIANTS
>>
>> configured.
>>
>> A stack trace with line numbers would also be helpful.
>
> Thanks for the hint.  After enabling INVARIANTS the kernel panics as
> soon I turn on the external USB3 disk.  No user disk access required.
>
> Version String: FreeBSD 14.0-CURRENT #34 main-n247239-f570a6723e1: Tue Jun
> 8 09:34:32 CEST 2021
>
> Here the kgdb backtrace:
>
> Unread portion of the kernel message buffer:
> panic: Duplicate free of 0xfffff800356b9000 from zone
> 0xfffffe00dcbdd800(da_ccb) slab 0xfffff800356b9fd8(0)
> cpuid = 8
> time = 1623140519
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe00c5f398c0
> vpanic() at vpanic+0x181/frame 0xfffffe00c5f39910
> panic() at panic+0x43/frame 0xfffffe00c5f39970
> uma_dbg_free() at uma_dbg_free+0x1e1/frame 0xfffffe00c5f399b0
> uma_zfree_arg() at uma_zfree_arg+0x147/frame 0xfffffe00c5f39a00
> camperiphdone() at camperiphdone+0x1b7/frame 0xfffffe00c5f39b20
> xpt_done_process() at xpt_done_process+0x3dd/frame 0xfffffe00c5f39b60
> xpt_done_td() at xpt_done_td+0xf5/frame 0xfffffe00c5f39bb0
> fork_exit() at fork_exit+0x80/frame 0xfffffe00c5f39bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c5f39bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
>
> __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> 55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
> pcpu,
> (kgdb) bt
> #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
> #1  doadump (textdump=textdump@entry=0)
>     at /usr/src/sys/kern/kern_shutdown.c:399
> #2  0xffffffff8040c39a in db_dump (dummy=<optimized out>,
>     dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>)
>     at /usr/src/sys/ddb/db_command.c:575
> #3  0xffffffff8040c192 in db_command (last_cmdp=<optimized out>,
>     cmd_table=<optimized out>, dopager=dopager@entry=1)
>     at /usr/src/sys/ddb/db_command.c:482
> #4  0xffffffff8040beed in db_command_loop ()
>     at /usr/src/sys/ddb/db_command.c:535
> #5  0xffffffff8040f616 in db_trap (type=<optimized out>, code=<optimized
> out>)
>     at /usr/src/sys/ddb/db_main.c:270
> #6  0xffffffff8066b1c4 in kdb_trap (type=type@entry=3, code=code@entry=0,
>     tf=<optimized out>, tf@entry=0xfffffe00c5f397f0)
>     at /usr/src/sys/kern/subr_kdb.c:727
> #7  0xffffffff809a4e96 in trap (frame=0xfffffe00c5f397f0)
>     at /usr/src/sys/amd64/amd64/trap.c:604
> #8  <signal handler called>
> #9  kdb_enter (why=0xffffffff80a61a23 "panic", msg=<optimized out>)
>     at /usr/src/sys/kern/subr_kdb.c:506
> #10 0xffffffff806207a2 in vpanic (fmt=<optimized out>, ap=<optimized out>,
>     ap@entry=0xfffffe00c5f39950) at /usr/src/sys/kern/kern_shutdown.c:907
> #11 0xffffffff80620533 in panic (
>     fmt=0xffffffff80d635c8 <cnputs_mtx> ".\024\244\200\377\377\377\377")
>     at /usr/src/sys/kern/kern_shutdown.c:843
> #12 0xffffffff808e12b1 in uma_dbg_free (zone=0xfffffe00dcbdd800,
>     slab=0xfffff800356b9fd8, item=0xfffff800356b9000)
>     at /usr/src/sys/vm/uma_core.c:5664
> #13 0xffffffff808d9de7 in item_dtor (zone=0xfffffe00dcbdd800,
>     item=0xfffff800356b9000, size=544, udata=0x0, skip=SKIP_NONE)
>     at /usr/src/sys/vm/uma_core.c:3418
> #14 uma_zfree_arg (zone=0xfffffe00dcbdd800, item=0xfffff800356b9000,
>     udata=udata@entry=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #15 0xffffffff802da503 in uma_zfree (zone=0xffffffff80d635c8 <cnputs_mtx>,
>     item=0x200) at /usr/src/sys/vm/uma.h:404
> #16 0xffffffff802d9117 in camperiphdone (periph=0xfffff800061e2c00,
>     done_ccb=0xfffff800355d6cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #17 0xffffffff802dfebd in xpt_done_process (ccb_h=0xfffff800355d6cc0)
>     at /usr/src/sys/cam/cam_xpt.c:5491
> #18 0xffffffff802e1ec5 in xpt_done_td (
>     arg=arg@entry=0xffffffff80d33d80 <cam_doneqs>)
>     at /usr/src/sys/cam/cam_xpt.c:5546
> #19 0xffffffff805dad80 in fork_exit (callout=0xffffffff802e1dd0
> <xpt_done_td>,
>     arg=0xffffffff80d33d80 <cam_doneqs>, frame=0xfffffe00c5f39c00)
>     at /usr/src/sys/kern/kern_fork.c:1083
> #20 <signal handler called>
>
> Apparently caused by recent changes to CAM.
>
> Let me know if you want more information.
>
> --
> Gary Jennejohn
>
>


-- 
Mateusz Guzik <mjguzik gmail.com>