Re: Armv7 panic on -current, rpi2 buildworld

From: Mark Millard <marklmi_at_yahoo.com>
Date: Mon, 20 Feb 2023 17:08:48 UTC
On Feb 20, 2023, at 04:32, Andrew Turner <andrew@fubar.geek.nz> wrote:

> Can you try with 24abb6b82102eec577eff9bd8dd7726e8cab89f4? There were conditional branch instructions that may mean the function to save the VFP state was not being run.
> 
> Andrew

I had eventually produced 3 programs showing different failed
results, 2 KASSERT panics and one example of floating point
data from the wrong thread eventually showing up (but no
KASSERT for the test sequence).

I've tested the later one via an armv7 kernel that
has:

c0681a2c <savectx>:
c0681a2c: e92d4000      stmdb   sp!, {lr}
c0681a30: e24dd004      sub     sp, sp, #4
c0681a34: e2803000      add     r3, r0, #0
c0681a38: e883fff0      stm     r3, {r4, r5, r6, r7, r8, r9, r10, r11, r12, sp, lr, pc}
c0681a3c: e1a01000      mov     r1, r0
c0681a40: e3a00000      mov     r0, #0
c0681a44: eb000b10      bl      0xc068468c <vfp_save_state> @ imm = #11328
c0681a48: e28dd004      add     sp, sp, #4
c0681a4c: e8bd8000      ldm     sp!, {pc}

and it still fails:

# g++12 -std=c++20 -pedantic -g -O3 -pthread -Wl,-rpath=/usr/local/lib/gcc12 dbl_and_ull_multithread.cpp
# ./a.out
Thread 1: 23618687.000000 != 4503599659991211
^C

The left hand side for Thread 1 should have had the huge value
too. Thread 0 has the smaller floating point/unsigned long long
values (that should be mathematically equal in the thread at
the point that they are tested). The two threads are independent
of each other but are doing the same type of loop --over
different numeric ranges.

So it looks like "necessary but not suffient" for that
test. (I'll leave the code change in place, as I doubt that
it is wrong.)

Given Kornel D.'s already existing notes, I did not expect
either KASSERT failure to be fixed by just this "fixed to be bl"
change.

(This test was done as part of my already started multi-system
environment upgrade sequence from 1400079 based to 1400081 based
after a tmpfs fix. So I patched the kernel source that I'd
already synchronized the source tree to [somewhat older from
yesterday].)



FYI: my current source for dbl_and_ull_multithread.cpp
looks like the below (whitespace details need not be
preserved). llvm15 is still missing std::osyncstream so
it is libstdc++ based (and, so, g++ based).

// # g++12 -std=c++20 -pedantic -g -O3 -pthread -Wl,-rpath=/usr/local/lib/gcc12 dbl_and_ull_multithread.cpp
// # ./a.out
// Thread [01]: double_value != unsigned_long_long_value 
// Use control-C to stop it.

#include <limits>     // std::numeric_limits
#include <future>     // std::future, std::async, std::launch::async
#include <string>     // std::to_string
#include <syncstream> // std::osyncstream
#include <iostream>   // std::cout

int main(void) {

    static_assert(std::numeric_limits<double>::radix==2,"double's radix is not 2 and is unhandled");

    constexpr unsigned int ull_width { std::numeric_limits<unsigned long long>::digits };
    constexpr unsigned int dbl_width { std::numeric_limits<double>::digits };
    constexpr unsigned int use_width { (dbl_width<ull_width) ? dbl_width : ull_width };

    constexpr unsigned long long bound { (1ull<<use_width)-1ull };

    auto the_job {
        [](unsigned int which_thr, unsigned long long n_init)
            {
                unsigned long long n       { n_init };
                double             n_as_dbl= n;

                while (n < bound) {
                    if (n_as_dbl != (double)n) {
                        std::osyncstream output{std::cout};
                        output << "Thread "
                               << std::to_string(which_thr)
                               << ": "
                               << std::to_string(n_as_dbl) // questionable if still same?
                               << " != "
                               << std::to_string(n)
                               << "\n";
                        break;
                    }

                    n++;
                    n_as_dbl+= 1.0;
                }
            }
    };

    auto thread_0 {
        std::async( std::launch::async
                  , the_job
                  , 0u
                  , 0ull
                  )
    };
    auto thread_1 {
        std::async( std::launch::async
                  , the_job
                  , 1u
                  , bound/2u
                  )
    };
    thread_0.wait();
    thread_1.wait();

    return 0;
}

I have previously demonstrated the problem via libc++
via both system clang (15) and g++12 --using a variant
from before I'd added the explicit output.

===
Mark Millard
marklmi at yahoo.com