Re: git: 6926e2699ae5 - main - arm: Add support for using VFP in kernel [added new: Called fill_fpregs while the kernel is using the VFP]
- Reply: Mark Millard : "Re: git: 6926e2699ae5 - main - arm: Add support for using VFP in kernel [new: bad floating point data with multi-threading, not a crash]"
- In reply to: Mark Millard : "Re: git: 6926e2699ae5 - main - arm: Add support for using VFP in kernel [td == curthread failed form of panic for bt in gdb]"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Thu, 16 Feb 2023 18:42:38 UTC
[Adding: Small c++ program that leads to a FreeBSD crash when I force a core-dump (using control-\) during runs .] On Feb 16, 2023, at 00:09, Mark Millard <marklmi@yahoo.com> wrote: > [A very simple program gets the failure under gdb > or lldb of example breakpoints.] > > On Feb 15, 2023, at 20:29, Mark Millard <marklmi@yahoo.com> wrote: > >> On Feb 15, 2023, at 16:08, Mark Millard <marklmi@yahoo.com> wrote: >> >>> Kornel Dulęba <kd_at_FreeBSD.org> wrote on >>> Date: Sat, 04 Feb 2023 19:22:23 UTC : >>> >>>> The branch main has been updated by kd: >>>> >>>> URL: https://cgit.FreeBSD.org/src/commit/?id=6926e2699ae55080f860488895a2a9aa6e6d9b4d >>>> >>>> commit 6926e2699ae55080f860488895a2a9aa6e6d9b4d >>>> Author: Kornel Dulęba <kd@FreeBSD.org> >>>> AuthorDate: 2023-02-04 12:59:30 +0000 >>>> Commit: Kornel Dulęba <kd@FreeBSD.org> >>>> CommitDate: 2023-02-04 19:21:43 +0000 >>>> >>>> arm: Add support for using VFP in kernel >>>> >>>> Add missing logic to allow in-kernel VFP usage for ARMv7 NEON. >>>> The implementation is strongly based on arm64 code. >>>> It introduces a family of fpu_kern_* functions to enable the usage >>>> of VFP instructions in kernel. >>>> Apart from that the existing armv7 VFP logic was modified, >>>> taking into account that the state of the VFP registers can now >>>> be modified in the kernel. >>>> >>>> Co-developed by: Wojciech Macek <wma@FreeBSD.org> >>>> Sponsored by: Stormshield >>>> Obtained from: Semihalf >>>> Reviewed by: andrew >>>> Differential Revision: https://reviews.freebsd.org/D37419 >>>> --- >>>> lib/libthread_db/arch/arm/libpthread_md.c | 21 ++-- >>>> sys/arm/arm/exec_machdep.c | 49 ++++---- >>>> sys/arm/arm/machdep.c | 1 + >>>> sys/arm/arm/machdep_kdb.c | 31 ++++- >>>> sys/arm/arm/swtch-v6.S | 8 +- >>>> sys/arm/arm/swtch.S | 8 +- >>>> sys/arm/arm/vfp.c | 182 +++++++++++++++++++++++++++++- >>>> sys/arm/arm/vm_machdep.c | 6 +- >>>> sys/arm/include/fpu.h | 7 ++ >>>> sys/arm/include/pcb.h | 5 + >>>> sys/arm/include/reg.h | 12 +- >>>> sys/arm/include/vfp.h | 17 +++ >>>> 12 files changed, 293 insertions(+), 54 deletions(-) >>> >>> [This is a somewhat adjusted version of a note replying >>> to a Warner note about a panic someone got during a >>> process coredump that was happening.] >>> >>> Just a possible point, given recent kernel floating >>> point work: >>> >>> I tried to do a typical build and test of some >>> benchmark programs that I sometimes use that involve >>> floating point in some of the programs, some use with >>> multithreading involved. (As FreeBSD and g++ progress >>> I tend to do this once and a while, not as often on >>> armv7 as on aarch64.) >>> >>> On armv7, I now usually get a message about a failure >>> of an internal cross-check, which also leads to the >>> program being stopped early. The messaging from run >>> to run varies what the failure is, but the runs should >>> not vary and should not fail the cross-checks --and >>> previously did not, including when I last tried armv7. >>> (Not recently.) >>> >>> For the specific example failures, the initial serial >>> (single thread) test with float involved works but the >>> following multi-thread test in the same program fails >>> and causes the program to stop when it notices there >>> is a problem. (On occasion the cross-check does does >>> not detect a problem.) >>> >>> The programs that do not test floating point do not >>> fail. (Same algorithm on integral types.) These can >>> involve floating point outside the algorithm >>> benchmarked, but with no multi-threading involved for >>> such and no floating point based cross-checks involved. >>> >>> At this point it is far from obvious to me how I >>> would trackdown the specifics of what leads to the >>> failed cross-checks. But the above is suggestive of >>> there being problems for armv7 handling of saving >>> and restoring floating point context for >>> multi-threading in a process, at least. I've no clue >>> if such are strictly limited to the floating point >>> values that show up vs. if there might be wider >>> memory handling problems that result in the process. >>> >> >> Further runs of the benchmark program show that I also >> get cross-check failures for single-threaded (the first >> way it tests). >> >> But it turns out that, even for single treaded execution >> of the algorithm benchmarked, it is not run on the >> process's initial thread but instead on a created thread. >> >> Turns out that for a debug armv7 kernel (debug is not >> what I normally run) attempting a bt in gdb can lead to >> a kernel panic (td == curthread failed) related to >> floating point handling: >> >> . . . >> (gdb) br serial_kernel_runner >> Breakpoint 1 at 0x1db34: serial_kernel_runner. (6 locations) >> (gdb) br parallel_kernel_runner >> Breakpoint 2 at 0x1b43c: parallel_kernel_runner. (6 locations) >> (gdb) run >> Starting program: /root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32-FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++-cpulockdown >> . . . >> >> Breakpoint 1, serial_kernel_runner<float, unsigned short> (clock_info=..., laps=3, memry=2, ki=...) at acpphint_kernelrunners.cpp:69 >> 69 static auto serial_kernel_runner >> (gdb) bt >> #0 serial_panic: Assertion td == curthread failed at /usr/main-src/sys/arm/arm/exec_machdep.c:103 >> cpuid = 3 >> time = 1676519530 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> pc = 0xc05f04a0 lr = 0xc007ab0c (db_trace_self_wrapper+0x30) >> sp = 0xe28ea960 fp = 0xe28eaa78 >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> pc = 0xc007ab0c lr = 0xc02ddc44 (vpanic+0x140) >> sp = 0xe28eaa80 fp = 0xe28eaaa0 >> r4 = 0x00000100 r5 = 0x00000000 >> r6 = 0xc0790bb4 r7 = 0xc0b1b930 >> vpanic() at vpanic+0x140 >> pc = 0xc02ddc44 lr = 0xc02dda28 (dump_savectx) >> sp = 0xe28eaaa8 fp = 0xe28eaaac >> r4 = 0xe28eaad0 r5 = 0xbfbfe150 >> r6 = 0xe28eaad0 r7 = 0xc076a096 >> r8 = 0xdb8a47f4 r9 = 0x00000016 >> r10 = 0x00000040 >> dump_savectx() at dump_savectx >> pc = 0xc02dda28 lr = 0xc05f3354 (get_vfpcontext+0xb8) >> sp = 0xe28eaab4 fp = 0xe28eaac8 >> get_vfpcontext() at get_vfpcontext+0xb8 >> pc = 0xc05f3354 lr = 0xc0611148 (cpu_ptrace+0x38) >> sp = 0xe28eaad0 fp = 0xe28eabe8 >> r4 = 0xdb75cba0 r5 = 0xbfbfe150 >> cpu_ptrace() at cpu_ptrace+0x38 >> pc = 0xc0611148 lr = 0xc0360f4c (kern_ptrace+0x810) >> sp = 0xe28eabf0 fp = 0xe28eac70 >> r4 = 0xe583dba0 r5 = 0x00000000 >> r6 = 0xdb8a47a8 r10 = 0x00000040 >> kern_ptrace() at kern_ptrace+0x810 >> pc = 0xc0360f4c lr = 0xc0360550 (sys_ptrace+0x1cc) >> sp = 0xe28eac78 fp = 0xe28eadc0 >> r4 = 0xe583de5c r5 = 0xe583dba0 >> r6 = 0xbfbfe150 r7 = 0x00000000 >> r8 = 0x00000000 r9 = 0xe583de50 >> r10 = 0xdb756730 >> sys_ptrace() at sys_ptrace+0x1cc >> pc = 0xc0360550 lr = 0xc0613b48 (swi_handler+0x170) >> sp = 0xe28eadc8 fp = 0xe28eae38 >> r4 = 0xe583dba0 r5 = 0x00000001 >> r6 = 0xc090b220 r7 = 0x00000000 >> r8 = 0x00000000 r9 = 0xe583de50 >> swi_handler() at swi_handler+0x170 >> pc = 0xc0613b48 lr = 0xc05f2d90 (swi_exit) >> sp = 0xe28eae40 fp = 0xbfbfe128 >> r4 = 0x00000042 r5 = 0x22e61c20 >> r6 = 0xbfbfe150 r7 = 0x0000001a >> r8 = 0x00424124 r9 = 0x00000108 >> r10 = 0x00000040 >> swi_exit() at swi_exit >> pc = 0xc05f2d90 lr = 0xc05f2d90 (swi_exit) >> sp = 0xe28eae40 fp = 0xbfbfe128 >> KDB: enter: panic >> [ thread pid 5438 tid 106943 ] >> Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! >> >> Note: the code was built via g++12 but using libc++, >> not libstdc++. >> >> So I tried the b=program variant that does not tryin to >> lock down which CPUs are used by the threads (a completely >> C++20 standard program variant, not FreeBSD specific for >> its used source code). Failure again . . . >> >> (gdb) br serial_kernel_runner >> Breakpoint 1 at 0x1c1bc: serial_kernel_runner. (6 locations) >> (gdb) br parallel_kernel_runner >> Breakpoint 2 at 0x19ac8: parallel_kernel_runner. (6 locations) >> (gdb) run >> Starting program: /root/acpphint/acpphint_kernelsamplers_main-OPi+2E-2048MiB-threads_4-ILP32-FreeBSD_main_n260797_dc1b8c9a846e_32bit-g++_12_O3lto-libc++ >> . . . >> Breakpoint 1, serial_kernel_runner<float, unsigned short> (clock_info=..., laps=3, memry=2, ki=...) at acpphint_kernelrunners.cpp:69 >> 69 static auto serial_kernel_runner >> (gdb) bt >> #0 serial_kernel_runner<float, unsigned short> (clock_info=...,panic: Assertion td == curthread failed at /usr/main-src/sys/arm/arm/exec_machdep.c:103 >> cpuid = 0 >> time = 1676520400 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> pc = 0xc05f04a0 lr = 0xc007ab0c (db_trace_self_wrapper+0x30) >> sp = 0xe2964960 fp = 0xe2964a78 >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> pc = 0xc007ab0c lr = 0xc02ddc44 (vpanic+0x140) >> sp = 0xe2964a80 fp = 0xe2964aa0 >> r4 = 0x00000100 r5 = 0x00000000 >> r6 = 0xc0790bb4 r7 = 0xc0b1b930 >> vpanic() at vpanic+0x140 >> pc = 0xc02ddc44 lr = 0xc02dda28 (dump_savectx) >> sp = 0xe2964aa8 fp = 0xe2964aac >> r4 = 0xe2964ad0 r5 = 0xbfbfe158 >> r6 = 0xe2964ad0 r7 = 0xc076a096 >> r8 = 0xdb7a511c r9 = 0x00000016 >> r10 = 0x00000040 >> dump_savectx() at dump_savectx >> pc = 0xc02dda28 lr = 0xc05f3354 (get_vfpcontext+0xb8) >> sp = 0xe2964ab4 fp = 0xe2964ac8 >> get_vfpcontext() at get_vfpcontext+0xb8 >> pc = 0xc05f3354 lr = 0xc0611148 (cpu_ptrace+0x38) >> sp = 0xe2964ad0 fp = 0xe2964be8 >> r4 = 0xdb7ca3e0 r5 = 0xbfbfe158 >> cpu_ptrace() at cpu_ptrace+0x38 >> pc = 0xc0611148 lr = 0xc0360f4c (kern_ptrace+0x810) >> sp = 0xe2964bf0 fp = 0xe2964c70 >> r4 = 0xdb76fba0 r5 = 0x00000000 >> r6 = 0xdb7a50d0 r10 = 0x00000040 >> kern_ptrace() at kern_ptrace+0x810 >> pc = 0xc0360f4c lr = 0xc0360550 (sys_ptrace+0x1cc) >> sp = 0xe2964c78 fp = 0xe2964dc0 >> r4 = 0xdb76fe5c r5 = 0xdb76fba0 >> r6 = 0xbfbfe158 r7 = 0x00000000 >> r8 = 0x00000000 r9 = 0xdb76fe50 >> r10 = 0xdb754000 >> sys_ptrace() at sys_ptrace+0x1cc >> pc = 0xc0360550 lr = 0xc0613b48 (swi_handler+0x170) >> sp = 0xe2964dc8 fp = 0xe2964e38 >> r4 = 0xdb76fba0 r5 = 0x00000001 >> r6 = 0xc090b220 r7 = 0x00000000 >> r8 = 0x00000000 r9 = 0xdb76fe50 >> swi_handler() at swi_handler+0x170 >> pc = 0xc0613b48 lr = 0xc05f2d90 (swi_exit) >> sp = 0xe2964e40 fp = 0xbfbfe130 >> r4 = 0x00000042 r5 = 0x22e61c20 >> r6 = 0xbfbfe158 r7 = 0x0000001a >> r8 = 0x00424124 r9 = 0x00000108 >> r10 = 0x00000040 >> swi_exit() at swi_exit >> pc = 0xc05f2d90 lr = 0xc05f2d90 (swi_exit) >> sp = 0xe2964e40 fp = 0xbfbfe130 >> KDB: enter: panic >> [ thread pid 1107 tid 100140 ] >> Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! >> >> For reference (whitespace may not have >> been preserved): >> >> void >> get_vfpcontext(struct thread *td, mcontext_vfp_t *vfp) >> { >> struct pcb *pcb; >> >> MPASS(td == curthread); >> >> pcb = td->td_pcb; >> if ((pcb->pcb_fpflags & PCB_FP_STARTED) != 0) { >> critical_enter(); >> vfp_store(&pcb->pcb_vfpstate, false); >> critical_exit(); >> } >> KASSERT(pcb->pcb_vfpsaved == &pcb->pcb_vfpstate, >> ("Called get_vfpcontext while the kernel is using the VFP")); >> memcpy(vfp->mcv_reg, pcb->pcb_vfpstate.reg, >> sizeof(vfp->mcv_reg)); >> vfp->mcv_fpscr = pcb->pcb_vfpstate.fpscr; >> } >> >> Unfortunately the benchmark program is far from being a >> minimalist/simple example. >> >> I'm not sure what FreeBSD might have around that would >> have floating point in use but be simple, and possibly >> standardly available, to see if a simpler context is >> available for analogous testing. >> > > The program, an example way to build it such that > it can lead to crashes, and 2 ways to get the > FreeBSD crash with it (native armv7 context): > > // # cc -std=c17 -pedantic -g -O3 simple_dbl.c > // > // # gdb a.out > // (gdb) br test > // (gdb) run > // FreeBSD CRASHES > // > // # lldb a.out > // (lldb) br set -F test > // FreeBSD CRASHES > > #include <stdlib.h> > > _Bool test(double v) { > return v<0.5; > } > > int main(void) { > return test(drand48()); > } Generating a FreeBSD crash during a core dump for this program looks like: # ./a.out ^\panic: Called fill_fpregs while the kernel is using the VFP cpuid = 3 time = 1676570748 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc05f04a0 lr = 0xc007ab0c (db_trace_self_wrapper+0x30) sp = 0xe340b790 fp = 0xe340b8a8 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc007ab0c lr = 0xc02ddc44 (vpanic+0x140) sp = 0xe340b8b0 fp = 0xe340b8d0 r4 = 0x00000100 r5 = 0x00000000 r6 = 0xc078f79c r7 = 0xc0b1b930 vpanic() at vpanic+0x140 pc = 0xc02ddc44 lr = 0xc02dda28 (dump_savectx) sp = 0xe340b8d8 fp = 0xe340b8dc r4 = 0xe5a68a80 r5 = 0xe29fbe90 r6 = 0xe2a132f0 r7 = 0xc5763140 r8 = 0xe2a132e0 r9 = 0xe5a68a80 r10 = 0xe340b960 dump_savectx() at dump_savectx pc = 0xc02dda28 lr = 0xc05fd5a4 (set_regs) sp = 0xe340b8e4 fp = 0xe340b8f8 set_regs() at set_regs pc = 0xc05fd5a4 lr = 0xc02617a4 (elf32_get_fpregset+0x2c) sp = 0xe340b900 fp = 0xe340b908 r4 = 0xe2a132f0 r5 = 0xc0261778 elf32_get_fpregset() at elf32_get_fpregset+0x2c pc = 0xc02617a4 lr = 0xc025f6b0 (elf32_coredump+0x308) sp = 0xe340b910 fp = 0xe340b988 r4 = 0xc090a65c r10 = 0xe340b960 elf32_coredump() at elf32_coredump+0x308 pc = 0xc025f6b0 lr = 0xc02e2af8 (sigexit+0xd18) sp = 0xe340b990 fp = 0xe340bcf8 r4 = 0x0000004e r5 = 0xe3e37300 r6 = 0x00000000 r7 = 0xc025f3a8 r8 = 0xd79c79ec r9 = 0xe3e37274 r10 = 0x00000000 sigexit() at sigexit+0xd18 pc = 0xc02e2af8 lr = 0xc02e3420 (postsig+0x12c) sp = 0xe340bd00 fp = 0xe340bd88 r4 = 0x00000003 r5 = 0xe32c23e0 r6 = 0xe340bd20 r7 = 0xe340bd18 r8 = 0xd79c7928 r9 = 0xe32cfab8 r10 = 0x00000002 postsig() at postsig+0x12c pc = 0xc02e3420 lr = 0xc02e73ec (ast_sig+0x11c) sp = 0xe340bd90 fp = 0xe340be08 r4 = 0xe32c23e0 r5 = 0xd79c79ec r6 = 0xc077b982 r7 = 0x00000000 r8 = 0xd79c7928 r9 = 0x00000ab8 r10 = 0x00000000 ast_sig() at ast_sig+0x11c pc = 0xc02e73ec lr = 0xc0348950 (ast_handler+0xe0) sp = 0xe340be10 fp = 0xe340be28 r4 = 0xe340be40 r5 = 0x0000000e r6 = 0x00004000 r7 = 0xc0973e3c r8 = 0xe32c23e0 r9 = 0x00000001 ast_handler() at ast_handler+0xe0 pc = 0xc0348950 lr = 0xc0348860 (ast+0x20) sp = 0xe340be30 fp = 0xe340be38 r4 = 0xe340be40 r5 = 0xe32c23e0 r6 = 0x00000000 r7 = 0x000001c6 r8 = 0x20032030 r9 = 0x204ae084 ast() at ast+0x20 pc = 0xc0348860 lr = 0xc05f2dcc (swi_exit+0x3c) sp = 0xe340be40 fp = 0xbfbfec60 r4 = 0x60000013 r5 = 0xe32c23e0 swi_exit() at swi_exit+0x3c pc = 0xc05f2dcc lr = 0xc05f2dcc (swi_exit+0x3c) sp = 0xe340be40 fp = 0xbfbfec60 KDB: enter: panic [ thread pid 3640 tid 100207 ] Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! Note: Having done a dump before reboot after this lead to following boot to fail while processing the dump: . . . Writing crash summary to /var/crash/core.txt.7. panic: Called fill_fpregs while the kernel is using the VFP . . . (Rebooting again worked.) The C++ program, with comments for the usage is: // # c++ -std=c++20 -pedantic -g -O3 -pthread dbl_and_ull_via_async.cpp // or: // # g++12 -std=c++20 -stdlib=libc++ -pedantic -g -O3 -pthread -Wl,-rpath=/usr/local/lib/gcc12 dbl_and_ull_via_async.cpp // then: // ./a.out // control-\ # to force the process core dump. // FreeBSD CRASHES #include <limits> // std::numeric_limits #include <future> // std::future, std::async, std::launch::async #include <cstdlib> // std::abort int main(void) { static_assert(std::numeric_limits<double>::radix==2,"double's radix is not 2 and is unhandled"); constexpr unsigned int ull_width = std::numeric_limits<unsigned long long>::digits; constexpr unsigned int dbl_width = std::numeric_limits<double>::digits; constexpr unsigned int use_width = (dbl_width<ull_width) ? dbl_width : ull_width; constexpr unsigned long long bound = (1ull<<use_width)-1ull; unsigned long long n = 0ull; double n_as_dbl = n; auto sequential= std::async ( std::launch::async , [&n,&n_as_dbl]() { while (n < bound) { if (n_as_dbl != (double)n) break; n++; n_as_dbl+= 1.0; } } ); sequential.wait(); if (n_as_dbl != (double)n) std::abort(); // testing same n_as_dbl value? return 0; } Note: The program has never generated an example of n_as_dbl != (double)n so far. === Mark Millard marklmi at yahoo.com