[Bug 261338] [PATCH] kernel panic "bad pte" on heavy CPU load on 12.2 and 12.3 (i386)
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" on heavy CPU load on 12.2 and 12.3 (i386)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" on heavy CPU load on 12.2 and 12.3 (i386)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" on heavy CPU load on 12.2 and 12.3 (i386)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" under heavy CPU load on 12.2 and 12.3 (i386)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" under heavy CPU load on 12.2 and 12.3 (i386)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" under heavy CPU load on 12.2 and 12.3 (i386)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" under heavy CPU load on 12.2 and 12.3 (i386)"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 261338] [PATCH] kernel panic "bad pte" under heavy CPU load on 12.2 and 12.3 (i386)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 19 Jan 2022 16:15:23 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261338 Bug ID: 261338 Summary: [PATCH] kernel panic "bad pte" on heavy CPU load on 12.2 and 12.3 (i386) Product: Base System Version: 12.3-RELEASE Hardware: i386 OS: Any Status: New Severity: Affects Some People Priority: --- Component: threads Assignee: threads@FreeBSD.org Reporter: thedix@yandex.ru Created attachment 231160 --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=231160&action=edit Panic screenshot After updating to 12.2p12 and 12.3p1 I noticed kernel panic under heavy multi-core CPU load. As an example of heavy load is building kernel in multi-threaded mode. Affected systems: - 12.2p12 i386 - 12.3p1 i386 12.X amd64 is not affected, 13.0 is not affected at all. Tested hardware: - Virtual machine 8 vCPU 4 GB vRAM under VMWare ESXi 6.7 - HP MicroServer Gen8 Intel Xeon E3-1265Lv2 16 GB RAM - PC Intel Core i5-4690 16 GB RAM Steps to reproduce: # cd /usr/src # make -s -j`sysctl -n hw.ncpu` KERNCONF=GENERIC buildkernel And after some time the system hangs with panic like: TPTE at 0x2857f14 IS ZERO @ VA 247c5000 panic: bad pte cpuid = 7 time = 1642334372 KDB: stack backtrace: #0 0x10438ee at kdb_backtrace+0x4e #1 0xffdb68 at vpanic+0x118 #2 0xffda44 at panic+0x14 #3 0x155b6d5 at pmap_remove_pages+0x5a5 #4 0x12fceb4 at vmspace_exit+0x94 #5 0xfbe0f3 at exit1+0x593 #6 0xfbdb52 at sys_sys_exit+0x12 #7 0x1561b79 at syscall+0x3e9 #8 0xffc033e7 at PTDpde+0x43ef Additional stack info: #0 0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370 370 savectx(&dumppcb); (kgdb) #0 0x00ffd9f6 in doadump () at /usr/src/sys/kern/kern_shutdown.c:370 #1 0x00ffd831 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:452 #2 0x00ffdbbf in vpanic (fmt=0x15d448a "bad pte", ap=0x1ff80a10 "") at /usr/src/sys/kern/kern_shutdown.c:881 #3 0x00ffda44 in panic (fmt=0x15d448a "bad pte") at /usr/src/sys/kern/kern_shutdown.c:808 #4 0x0155b6d5 in pmap_remove_pages (pmap=0x22a0354c) at /usr/src/sys/i386/i386/pmap.c:4845 #5 0x012fceb4 in vmspace_exit (td=0x1bb57380) at /usr/src/sys/vm/vm_map.c:411 #6 0x00fbe0f3 in exit1 (td=0x1bb57380, rval=0, signo=0) at /usr/src/sys/kern/kern_exit.c:399 #7 0x00fbdb52 in sys_sys_exit (td=0x1bb57380, uap=0x1bb57604) at /usr/src/sys/kern/kern_exit.c:176 #8 0x01561b79 in syscall (frame=0x1ff80ba8) at src/sys/i386/i386/../../kern/subr_syscall.c:144 #9 0xffc033e7 in ?? () #10 0x00000033 in ?? () I made some research on the kernel code and found the problem appeared in the recent changes of SMP processing in mp_x86.c: https://github.com/freebsd/freebsd-src/commit/1820ca2154611d6f27ce5a5fdd561a16ac54fdd8#diff-b34ee41e14f87fb2b18fdf77337237f336830ae88aac2a02e1c32aa45e43b4de https://reviews.freebsd.org/D33413 The problem is in the function smp_targeted_tlb_shootdown(): - sched_pin(); + KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); Under some circumstances the function is not pinned, which later causes PTE panic. I recompiled GENERIC kernel with INVARIANTS options and added the function name to the assertion text for additional info and got an immediate panic during the boot (see attached image panic_not_pinned.png). So the fix is to revert this line back: - KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); + sched_pin(); I attached the patch mp_x86.c.patch to fix the problem. After recompiling the kernel with this patch, I no longer see panics on both 12.2 and 12.3 when recompiling the kernel further. -- You are receiving this mail because: You are the assignee for the bug.