Re: git: 95f773e59482 - main - i386 copyout_fast: improve detection of a fault on accessing userspace
Date: Thu, 08 Sep 2022 06:00:13 UTC
On Wed, Sep 07, 2022 at 11:18:39PM +0300, Konstantin Belousov wrote: > On Wed, Sep 07, 2022 at 06:38:04PM +0200, Tijl Coosemans wrote: > > On Tue, 6 Sep 2022 23:17:45 +0200 Tijl Coosemans <tijl@FreeBSD.org> wrote: > > > On Tue, 6 Sep 2022 18:30:01 +0300 Konstantin Belousov > > > <kostikbel@gmail.com> wrote: > > >> I suspect you see that leftover panics, which I am working on right now. > > > > > > Yes, it looks like this: > > > > > > panic: vm_fault_lookup: fault on nofault entry, addr: 0 > > > > > > GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD] > > > Copyright (C) 2022 Free Software Foundation, Inc. > > > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > > > This is free software: you are free to change and redistribute it. > > > There is NO WARRANTY, to the extent permitted by law. > > > Type "show copying" and "show warranty" for details. > > > This GDB was configured as "i386-portbld-freebsd14.0". > > > Type "show configuration" for configuration details. > > > For bug reporting instructions, please see: > > > <https://www.gnu.org/software/gdb/bugs/>. > > > Find the GDB manual and other documentation resources online at: > > > <http://www.gnu.org/software/gdb/documentation/>. > > > > > > For help, type "help". > > > Type "apropos word" to search for commands related to "word"... > > > Reading symbols from /boot/kernel/kernel... > > > Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... > > > > > > Unread portion of the kernel message buffer: > > > panic: vm_fault_lookup: fault on nofault entry, addr: 0 > > > time = 1662487582 > > > KDB: stack backtrace: > > > db_trace_self_wrapper(5,ffffffff,0,0,0,...) at db_trace_self_wrapper+0x28/frame 0xffc09cf4 > > > kdb_backtrace(1a716ae0,100,e340e0,ffc09db4,ffc09e00,...) at kdb_backtrace+0x2b/frame 0xffc09d4c > > > vpanic(bef316,ffc09d88,ffc09d88,ffc09dac,b3181a,...) at vpanic+0xe9/frame 0xffc09d68 > > > panic(bef316,be68d1,0,ffc09dc4,480f308,...) at panic+0x14/frame 0xffc09d7c > > > vm_fault_lookup(0,0,1a710701,0,0,...) at vm_fault_lookup+0x13a/frame 0xffc09dac > > > vm_fault(e340e0,0,1,0,0) at vm_fault+0x79/frame 0xffc09e30 > > > vm_fault_trap(e340e0,3b,1,0,0,0) at vm_fault_trap+0x44/frame 0xffc09e58 > > > trap_pfault(3b,0,0) at trap_pfault+0x119/frame 0xffc09ea0 > > > trap(ffc09f6c,8,28,28,e247000,...) at trap+0x2d4/frame 0xffc09f60 > > > calltrap() at 0xffc031ff/frame 0xffc09f60 > > > --- trap 0xc, eip = 0x3b, esp = 0xffc09fac, ebp = 0xffc033dc --- > > > (null)() at 0x3b/frame 0xffc033dc > > > KDB: enter: panic > > > > > > 0x009a1dfd in dump_savectx () > > > at /home/tijl/freebsd/base/main/sys/kern/kern_shutdown.c:404 > > > (kgdb) #0 0x009a1dfd in dump_savectx () > > > at /home/tijl/freebsd/base/main/sys/kern/kern_shutdown.c:404 > > > Backtrace stopped: Cannot access memory at address 0xffc09af4 > > > > The eip = 0x3b here looks like the size of a copy so I added a test on > > top of your patch to see if copyout_fast or copyin_fast was called a > > second time (see attached patch) because that would push the size where > > eip is in the trapframe. This triggered after running for a while with > > this backtrace: > > > > Unread portion of the kernel message buffer: > > panic: > > time = 1662560141 > > KDB: stack backtrace: > > db_trace_self_wrapper(5,ffffffff,0,0,0,...) at db_trace_self_wrapper+0x28/frame 0x1b0bca0c > > kdb_backtrace(1c1f93a0,100,1b0bcb1a,ffc06ff0,141e000,...) at kdb_backtrace+0x2b/frame 0x1b0bca64 > > vpanic(b8da7d,1b0bcaa0,1b0bcaa0,1b0bcaac,ffc042af,...) at vpanic+0xe9/frame 0x1b0bca80 > > panic(b8da7d,fb028f96,2,0,1b0bcadc,...) at panic+0x14/frame 0x1b0bca94 > > __stop_set_sysinit_set(1b0bcb1a,fb028f96,2,141e000) at 0xffc042af/frame 0x1b0bcaac > > copyout(1b0bcb1a,fb028f96,2) at copyout+0x58/frame 0x1b0bcadc > > pollout(fb028f90,2) at pollout+0x3c/frame 0x1b0bcb04 > > kern_poll(1c1f93a0,fb028f90,2,1b0bcc3c,0) at kern_poll+0x83/frame 0x1b0bcc20 > > sys_poll(1c1f93a0,1c1f9644) at sys_poll+0x54/frame 0x1b0bcc48 > > syscallenter(1,0,1c1f93a0,1b0bcd30,480f318,...) at syscallenter+0xc0/frame 0x1b0bcc78 > > syscall(1b0bcce8,3b,3b,3b,3dfb1300,...) at syscall+0x28/frame 0x1b0bccdc > > Xint0x80_syscall() at 0xffc03419/frame 0x1b0bccdc > > --- syscall (4, FreeBSD ELF32, sys_write), eip = 0x2058c67b, esp = 0xffbfc524, ebp = 0x1 --- > > KDB: enter: panic > > > > > > So interrupts must have been reenabled somehow, probably by the page > > fault handler, and this allows context switches and then another process > > can call copyout or copyin and corrupt the trampoline stack and > > copyout_buf. > I do not see where the interrupts could be reenabled in copyout_fast path, > without or with page fault on the userspace access. Also I am not sure > what should your patch demonstrate. > > Also, pmap_trm_alloc() puts guard page between consequent allocations, > so trampoline stack overflow must be very careful to not really overflow, > but just touch enough bytes to give the effect. > > But lets check the hypothesis. If interrupts are enabled somehow, then > processor would execute interrupt/fault handler on the same stack, which > is trampoline stack and not the kstack. I added an INVARIANTS check that > verifies that both trap() and syscall() use kstack. Could you, please, > fetch my ast branch and see what it the outcome? I realized that interrupt handlers (as opposed to exception handlers) were missed. I updated the commit with handling for apic/atpic interrupts and TLB shutdown handlers.