Is CPUTYPE=cortex-A7 supposed to work?
Mark Millard
markmi at dsl-only.net
Thu Mar 16 18:44:31 UTC 2017
On 2017-Mar-16, at 10:27 AM, Andrew Gierth <andrew at tao11.riddles.org.uk> wrote:
>
>>>>>> "Sylvain" == Sylvain Garrigues <sylvain at sylvaingarrigues.com> writes:
>
> Sylvain> Thank you so much for your quick feedback Michal. Good to know
> Sylvain> this matter is into good hands. I’m afraid I'm still afraid
> Sylvain> about `basic’ programs like git being still potentially broken
> Sylvain> when kernel+world+ports have CPUTYPE=cortex-a7 in make.conf -
> Sylvain> Andrew said a simple `git clone' could fail, more precisely
> Sylvain> (quoting him):
>
>>> I have determined that the sha1 failures occur only if the
>>> NEON-enabled SHA1 block function is interrupted by a signal. This
>>> explains why it fails in git (which is using SIGALRM to set a
>>> "display progress" flag) but not in standalone SHA1 tests;
>
> Sylvain> Removing CPUTYPE apparently fixes things hence I’m not 100%
> Sylvain> confident yet of keeping CPUTYPE=cortex-a7 myself even if only
> Sylvain> a few ports might be affected. Git is an important port, who
> Sylvain> knows what other ports are broken :-)
>
> Let me clarify this.
>
> Without CPUTYPE, things _appear_ to work because only explicit use of
> floating-point exposes the bug, and it's extremely rare for programs to
> use floating-point in signal handlers, and even then the only result
> would be incorrect floating-point calculations in the interrupted code.
>
> With CPUTYPE, the compiler can generate NEON instructions for integer
> code; even without heavy optimization enabled, it might choose to use
> NEON register load/store to copy small data structures for example. One
> piece of code which is affected by this is the signal handling functions
> in libthr, which wrap the program's own signal handler functions; so
> now, _every_ signal handler in a program linked with libthr uses the
> NEON registers, and the result isn't limited to corrupting
> floating-point calculations but can corrupt data structures or copied
> memory, or the results of vectorized code.
>
> The specific failures that I saw -- git failing, emacs crashing, errors
> from openssl speed -- were all of this second kind and therefore not
> directly reproducible without CPUTYPE; but the test program I gave for
> the bug report demonstrates the problem by using explicit floating-point
> (in a highly contrived way) and therefore reproduces the issue even
> without CPUTYPE.
>
> So even though the bug is in the kernel and affects all armv6 targets
> whether NEON is in use or not, the chances of actually hitting it are
> pretty negligible if you built the world without CPUTYPE. But if you
> build with CPUTYPE, then potentially any code that catches a signal is
> affected; it's just that programs (like git) that combine signal
> handling with vectorized crypto code, or programs (like emacs) that use
> signal handling very heavily, have the greatest probability of failure.
>
> tl/dr: building without CPUTYPE is a workaround that simply reduces both
> the chance and severity of failure; building with CPUTYPE currently
> breaks almost everything, but with a probability that varies wildly
> depending on what the application does.
>
> --
> Andrew.
As I understand there are also issues beyond the fix for signal
delivery.
On 2017-Mar-12, at 12:17 AM, Michal Meloun <melounmichal at gmail.com> wrote:
> The struct fpreg is also wrong and I'm not sure if
> or how we can to fix this in compatible way.
Looking up some details shows sys/arm/include/reg.h with:
struct fpreg {
unsigned int fpr_fpsr;
fp_reg_t fpr[8];
};
[Covers only 8 floating point registers?]
and shows sys/arm/include/fp.h with:
typedef struct fp_extended_precision {
u_int32_t fp_exponent;
u_int32_t fp_mantissa_hi;
u_int32_t fp_mantissa_lo;
} fp_extended_precision_t;
typedef struct fp_extended_precision fp_reg_t;
. . .
/*
* Type for a saved FP context, if we want to translate the context to a
* user-readable form
*/
typedef struct {
u_int32_t fpsr;
fp_extended_precision_t regs[8];
} fp_state_t;
So each of:
struct fpreg
fp_state_t
has room for 8 instances of 96 bits (beyond fpsr), not sufficient
for 32 double precision (i.e., 64-bit) registers.
The arm code also has:
# grep -r "\<fpreg\>" /usr/src/sys/arm/ | more
/usr/src/sys/arm/arm/machdep.c:fill_fpregs(struct thread *td, struct fpreg *regs)
/usr/src/sys/arm/arm/machdep.c:set_fpregs(struct thread *td, struct fpreg *regs)
/usr/src/sys/arm/include/reg.h:struct fpreg {
/usr/src/sys/arm/include/reg.h:int fill_fpregs(struct thread *, struct fpreg *);
/usr/src/sys/arm/include/reg.h:int set_fpregs(struct thread *, struct fpreg *);
And the system has:
/usr/src/sys/sys/procfs.h:typedef struct fpreg fpregset_t;
/usr/src/sys/sys/procfs.h: size_t pr_fpregsetsz; /* sizeof(fpregset_t) (1) */
/usr/src/sys/sys/procfs.h:typedef fpregset_t prfpregset_t;
/usr/src/sys/sys/ptrace.h:struct fpreg;
/usr/src/sys/sys/ptrace.h:int proc_read_fpregs(struct thread *_td, struct fpreg *_fpreg);
/usr/src/sys/sys/ptrace.h:int proc_write_fpregs(struct thread *_td, struct fpreg *_fpreg);
and:
/usr/src/sys/kern/sys_process.c: * Ptrace doesn't support fpregs at all, and there are no security holes
/usr/src/sys/kern/sys_process.c: * or translations for fpregs, so we can just copy them.
/usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs(td, fpregs));
/usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs(td, fpregs));
/usr/src/sys/kern/sys_process.c: struct fpreg fpreg;
/usr/src/sys/kern/sys_process.c: error = COPYIN(uap->addr, &r.fpreg, sizeof r.fpreg);
/usr/src/sys/kern/sys_process.c: error = COPYOUT(&r.fpreg, uap->addr, sizeof r.fpreg);
/usr/src/sys/kern/sys_process.c: error = PROC_WRITE(fpregs, td2, addr);
/usr/src/sys/kern/sys_process.c: error = PROC_READ(fpregs, td2, addr);
and there is use of some of the above in:
/usr/src/sys/kern/sys_process.c: * proc_read_fpregs, proc_write_fpregs
/usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs(td, fpregs));
/usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs(td, fpregs));
/usr/src/sys/kern/sys_process.c:proc_read_fpregs32(struct thread *td, struct fpreg32 *fpregs32)
/usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs32(td, fpregs32));
/usr/src/sys/kern/sys_process.c:proc_write_fpregs32(struct thread *td, struct fpreg32 *fpregs32)
/usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs32(td, fpregs32));
I may not have found everything relevant.
It appears that fp_state_t is unused in /usr/src/sys/ .
Note: I was looking at /usr/src/sys/ for. . .
# uname -paKU
FreeBSD FreeBSDx64 12.0-CURRENT FreeBSD 12.0-CURRENT r314687M amd64 amd64 1200023 1200023
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-arm
mailing list