Re: panic: Unaligned free (was: kernel panic while copying files)

From: Warner Losh <imp_at_bsdimp.com>
Date: Wed, 30 Jun 2021 16:35:14 UTC
On Wed, Jun 30, 2021 at 6:58 AM Gary Jennejohn <gljennjohn@gmail.com> wrote:

> On Wed, 30 Jun 2021 06:02:59 +0100
> Graham Perrin <grahamperrin@gmail.com> wrote:
>
> > On 29/06/2021 10:42, Gary Jennejohn wrote:
> > > ___ panic is now the result of an unaligned free.
> > >
> > > panic: Unaligned free of 0xfffff800259e2800 from zone
> > >      0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3)
> > >
> > > I have the crash dump and a debug kernel in case anyone wants more
> info.
> > Can you post the backtrace etc. here? Thanks
> >
>
> Sure.  As can be seen from the uma zone being da_ccb, the panic
> resulted from setting kern.cam.da.enable_uma_ccbs=1.
>
> Unread portion of the kernel message buffer:
> panic: Unaligned free of 0xfffff800259e2800 from zone
> 0xfffffe00dc9d2000(da_ccb) slab 0xfffff800259e2fd8(3)
> cpuid = 2
> time = 1624958650
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame
> 0xfffffe00c62687a0
> kdb_backtrace() at kdb_backtrace+0x46/frame 0xfffffe00c6268850
> vpanic() at vpanic+0x227/frame 0xfffffe00c62688f0
> panic() at panic+0x4e/frame 0xfffffe00c6268950
> uma_dbg_free() at uma_dbg_free+0xfc/frame 0xfffffe00c62689a0
> item_dtor() at item_dtor+0x7c/frame 0xfffffe00c62689e0
> uma_zfree_arg() at uma_zfree_arg+0xf0/frame 0xfffffe00c6268a50
> uma_zfree() at uma_zfree+0x23/frame 0xfffffe00c6268a70
> xpt_free_ccb() at xpt_free_ccb+0x43/frame 0xfffffe00c6268a90
> camperiphdone() at camperiphdone+0x211/frame 0xfffffe00c6268ae0
> xpt_done_process() at xpt_done_process+0x550/frame 0xfffffe00c6268b40
> xpt_done_td() at xpt_done_td+0x1c0/frame 0xfffffe00c6268b80
> fork_exit() at fork_exit+0x117/frame 0xfffffe00c6268bf0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c6268bf0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
> KDB: enter: panic
>
> doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> 399             dumptid = curthread->td_tid;
> (kgdb) bt
> #0  doadump (textdump=0) at /usr/src/sys/kern/kern_shutdown.c:399
> #1  0xffffffff804d5dd7 in db_dump (dummy=-2138843371, dummy2=false,
> dummy3=-1,
>     dummy4=0xfffffe00c6268320 "") at /usr/src/sys/ddb/db_command.c:575
> #2  0xffffffff804d5bf4 in db_command (
>     last_cmdp=0xffffffff8114ce80 <db_last_command>, cmd_table=0x0,
> dopager=1)
>     at /usr/src/sys/ddb/db_command.c:482
> #3  0xffffffff804d583c in db_command_loop ()
>     at /usr/src/sys/ddb/db_command.c:535
> #4  0xffffffff804da27c in db_trap (type=3, code=0)
>     at /usr/src/sys/ddb/db_main.c:270
> #5  0xffffffff8083df9d in kdb_trap (type=3, code=0, tf=0xfffffe00c6268770)
>     at /usr/src/sys/kern/subr_kdb.c:727
> #6  0xffffffff80d31494 in trap (frame=0xfffffe00c6268770)
>     at /usr/src/sys/amd64/amd64/trap.c:604
> #7  0xffffffff80d32628 in trap_check (frame=0xfffffe00c6268770)
>     at /usr/src/sys/amd64/amd64/trap.c:664
> #8  <signal handler called>
> #9  breakpoint () at /usr/src/sys/amd64/include/cpufunc.h:66
> #10 0xffffffff8083d3d0 in kdb_enter (why=0xffffffff80e0355b "panic",
>     msg=0xffffffff80e0355b "panic") at /usr/src/sys/kern/subr_kdb.c:505
> #11 0xffffffff807d1725 in vpanic (
>     fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab
> %p(%d)", ap=0xfffffe00c6268930) at /usr/src/sys/kern/kern_shutdown.c:906
> #12 0xffffffff807d120e in panic (
>     fmt=0xffffffff80dbca46 "Unaligned free of %p from zone %p(%s) slab
> %p(%d)")
>     at /usr/src/sys/kern/kern_shutdown.c:843
> #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000,
>     slab=0xfffff800259e2fd8, item=0xfffff800259e2800)
>     at /usr/src/sys/vm/uma_core.c:5659
> #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000,
>     item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
>     at /usr/src/sys/vm/uma_core.c:3418
> #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000,
>     item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> #16 0xffffffff802e45d3 in uma_zfree (zone=0xfffffe00dc9d2000,
>     item=0xfffff800259e2800) at /usr/src/sys/vm/uma.h:404
> #17 0xffffffff802dc3c3 in xpt_free_ccb (free_ccb=0xfffff800259e2800)
>     at /usr/src/sys/cam/cam_xpt.c:4676
> #18 0xffffffff802dacf1 in camperiphdone (periph=0xfffff80025329b00,
>     done_ccb=0xfffff80025a24cc0) at /usr/src/sys/cam/cam_periph.c:1427
> #19 0xffffffff802e4520 in xpt_done_process (ccb_h=0xfffff80025a24cc0)
>     at /usr/src/sys/cam/cam_xpt.c:5493
> #20 0xffffffff802e68e0 in xpt_done_td (arg=0xffffffff81143700 <cam_doneqs>)
>     at /usr/src/sys/cam/cam_xpt.c:5548
> #21 0xffffffff807673c7 in fork_exit (callout=0xffffffff802e6720
> <xpt_done_td>,
>     arg=0xffffffff81143700 <cam_doneqs>, frame=0xfffffe00c6268c00)
>     at /usr/src/sys/kern/kern_fork.c:1083
> #22 <signal handler called>
>
> [kgdb stuff removed]
>
> (kgdb) down
> #15 0xffffffff80c0ba60 in uma_zfree_arg (zone=0xfffffe00dc9d2000,
>     item=0xfffff800259e2800, udata=0x0) at /usr/src/sys/vm/uma_core.c:4374
> 4374                    item_dtor(zone, item, cache_uz_size(cache), udata,
> SKIP_NONE);
> (kgdb) down
> #14 0xffffffff80c0c5dc in item_dtor (zone=0xfffffe00dc9d2000,
>     item=0xfffff800259e2800, size=544, udata=0x0, skip=SKIP_NONE)
>     at /usr/src/sys/vm/uma_core.c:3418
> 3418                            uma_dbg_free(zone, NULL, item);
> (kgdb) p/x skipdbg
> $26 = 0x0
> (kgdb) p/x zone->uz_flags
> $27 = 0x41000000 (UMA_ZFLAG_TRASH|UMA_ZFLAG_CTORDTOR)
> (kgdb) down
> #13 0xffffffff80c16a8c in uma_dbg_free (zone=0xfffffe00dc9d2000,
>     slab=0xfffff800259e2fd8, item=0xfffff800259e2800)
>     at /usr/src/sys/vm/uma_core.c:5659
> 5659                    panic("Unaligned free of %p from zone %p(%s) slab
> %p(%d)",
>
> Note that item is the same as saved_ccb.
>
> (kgdb) p/x *zone->uz_keg
> $28 = {uk_hash = {uh_slab_hash = 0x0, uh_hashsize = 0x0, uh_hashmask =
> 0x0},
>   uk_zones = {lh_first = 0xfffffe00dc9d2000}, uk_dr = {
>     dr_policy = 0xffffffff810010e0, dr_iter = 0x0}, uk_align = 0x7,
>   uk_reserve = 0x0, uk_size = 0x220, uk_rsize = 0x220,
>   uk_init = 0xffffffff80c17d50, uk_fini = 0xffffffff80c17d80,
>   uk_allocf = 0xffffffff80d342f0, uk_freef = 0xffffffff80d346a0,
>   uk_offset = 0x0, uk_kva = 0x0, uk_pgoff = 0xfd8, uk_ppera = 0x1,
>   uk_ipers = 0x7, uk_flags = 0x0, uk_name = 0xffffffff80debbac, uk_link = {
>     le_next = 0xfffff80005968000, le_prev = 0xfffff80005968380},
>   uk_domain = 0xfffff80005968240}
> (kgdb) p/x *slab
> $29 = {us_link = {le_next = 0xdeadc0dedeadc0de, le_prev =
> 0xdeadc0dedeadc0de},
>   us_freecount = 0xc0de, us_flags = 0xad, us_domain = 0xde, us_free = {
>     __bits = 0xfffff800259e2ff0}}
> (kgdb) p/x *0xfffff800259e2ff0
> $30 = 0xdeadc0de
> Don't know whether this matters, but slab seems to be unitialized.
> (kgdb) p/x freei
> $31 = 0x3
>
> In any case, saved_ccb has an address which lies outside the range
> covered by slab, i.e. freei is bigger than the number of entries in
> slab.
>
> I suspect that the only way to really figure out what's going on is to
> run the kernel in kgbd and set lots of breakpoints,
>

What's happening is this, I think.

(1) We send a request.
(2) It fails, so we send a start unit
BUT we do weird things to copy the CCBs around, and the request from (1)
and (2) have different allocations. Error recovery overwrites the original
request with a new request after saving it off....
(3) start unit succeeds, we go to free one of the CCBs and it's marked
incorrectly, triggering either this panic or the prior one we saw.

These actions were fine when there was one allocator, but now that there
are two more care must be taken, and that more care hasn't been taken yet,
so kern.cam.da.enable_uma_ccbs=1 is unsafe for now and should not be used.
kern.cam.ada.enable_uma_ccbs=1 is fine as its error recovery doesn't do
these things.

Warner