Re: git: 005aa1743b42 - main - modules: bzero the modspecific_t

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 02 Jul 2023 16:34:02 UTC
Ka Ho Ng <khng300_at_gmail.com> wrote on
Date: Sun, 02 Jul 2023 10:18:35 UTC :

On Sun, Jul 2, 2023 at 6:03 AM Ka Ho Ng <khng300@gmail.com> wrote:
> 
> > On Sat, Jul 1, 2023 at 7:13 PM Konstantin Belousov <kostikbel@gmail.com>
> > wrote:
> >
> >> On Sat, Jul 01, 2023 at 10:59:22PM +0000, Ka Ho Ng wrote:
> >> > The branch main has been updated by khng:
> >> >
> >> > URL:
> >> https://cgit.FreeBSD.org/src/commit/?id=005aa1743b42b52fbd49b9d5ec44816902b6ee9f
> >> >
> >> > commit 005aa1743b42b52fbd49b9d5ec44816902b6ee9f
> >> > Author: Ka Ho Ng <khng@FreeBSD.org>
> >> > AuthorDate: 2023-07-01 19:41:53 +0000
> >> > Commit: Ka Ho Ng <khng@FreeBSD.org>
> >> > CommitDate: 2023-07-01 22:58:46 +0000
> >> >
> >> > modules: bzero the modspecific_t
> >> >
> >> > Per https://reviews.llvm.org/D68115, only the first field is
> >> > zero-initialized, meanwhile other fields are undef.
> >> This is not true.
> >>
> >> >
> >> > The pattern can be observed on clang as well, that when
> >> > -ftrivial-auto-var-init=pattern is specified 0xaa is filled for
> >> > non-active fields, otherwise they are zero-initialized.
> >> What are non-active fields?
> >> All fields with implicit initializers
> >> "shall be initialized implicitly the same as
> >> objects that have static storage duration."
> >>
> >> I do not think this is required for padding.
> >>
> > In that case, modspecific_t ms did become 0xaaaaaaaa00000000 on amd64 if
> > -ftrivial-auto-var-init=pattern was specified.
> >
> >
> >>
> >> > Technically both are acceptable when using clang. However it
> >> > would be good to simply bzero the modspecific_t in such case to
> >> > be strict to the standard.
> >> >
> >> > MFC with: 2cab2d43b83b
> >> > MFC after: 1 day
> >> Min MFC period is 3 days.
> >>
> >> > Sponsored by: Juniper Networks, Inc.
> >> > Reviewed by: delphij
> >> > Differential Revision: https://reviews.freebsd.org/D40830
> >> > ---
> >> > sys/kern/kern_syscalls.c | 3 ++-
> >> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/sys/kern/kern_syscalls.c b/sys/kern/kern_syscalls.c
> >> > index 234e51cfd280..0b51edf5e985 100644
> >> > --- a/sys/kern/kern_syscalls.c
> >> > +++ b/sys/kern/kern_syscalls.c
> >> > @@ -173,9 +173,10 @@ kern_syscall_module_handler(struct sysent
> >> *sysents, struct module *mod,
> >> > int what, void *arg)
> >> > {
> >> > struct syscall_module_data *data = arg;
> >> > - modspecific_t ms = { 0 };
> >> > + modspecific_t ms;
> >> > int error;
> >> >
> >> > + bzero(&ms, sizeof(ms));
> >> > switch (what) {
> >> > case MOD_LOAD:
> >> > error = kern_syscall_register(sysents, data->offset,
> >
> >
> > Ka Ho
> >
> Since I missed the reply-all button just now, I resend this email.
> 
> Reading on N2716 one of the footnote stated:
> "Note that aggregate type does not include union type because an object
> with union type can only contain one member at a time"
> 
> And below at 6.7.9.21 only aggregate was mentioned but not union, which
> matched what I observed with an `cc -ftrivial-auto-var-init=pattern`
> invocation.

The context for modspecific_t is (looking in my source
tree for main):

# grep -r modspecific_t /usr/main-src/sys/ | more
/usr/main-src/sys/sys/module.h:} modspecific_t;
/usr/main-src/sys/sys/module.h:void     module_setspecific(module_t, modspecific_t *);
/usr/main-src/sys/sys/module.h: modspecific_t   data;
/usr/main-src/sys/kern/kern_module.c:   modspecific_t           data;   /* module specific data */
/usr/main-src/sys/kern/kern_module.c:module_setspecific(module_t mod, modspecific_t *datap)
/usr/main-src/sys/kern/kern_module.c:   modspecific_t   data;
/usr/main-src/sys/kern/kern_module.c:   modspecific_t data;
/usr/main-src/sys/kern/kern_syscalls.c: modspecific_t ms;

This is C code, not C++ code. The languages are not fully
matching in various details in some similar areas, especially
when the vintages are not from a similar timeframe. Also,
modspecific_t is a union:

/*
 * A module can use this to report module specific data to the user via
 * kldstat(2).
 */
typedef union modspecific {
        int     intval;
        u_int   uintval;
        long    longval;
        u_long  ulongval;
} modspecific_t;

When I look up C's N2716 in:

https://www.open-std.org/jtc1/sc22/wg14/www/wg14_document_log.htm

I find: "N2716 2021/05/09 Tydeman, Numerically equal" with the
link https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2716.htm .
Its summary line is:

QUOTE
Summary: The phrases "numerically equal" and "numerically equivalent" are used, but never defined. Also, they seem to add no value to just "equal".
END QUOTE

(There is also: "N2847 2021/10/15 Thomas, C23 proposal - Revised
suggested change from N2716" with the link:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2847.pdf .)

This has nothing to do with what you are writing about.

For C code, I suggest only using C langauge specification
materials, not C++, unless the same source is to have
dual language use for some reason. (I see no indication
of such here. If such use is the intended context, please
be explicit about that.)

Also, using material from 2021 when FreeBSD is based on a
older C standard version can have its own problems. If I
remember right, FreeBSD is basically targeting C11 or
so for its C code.

An example quote from footnote 310 (for memcmp) in ISO/IEC
9899:2011 is:

QUOTE
The contents of "holes" used as padding for purposes of
alignment within structure objects are indeterminate.
Strings shorter than the allocated space and unions may
also cause problems in comparison.
END QUOTE

Initializing .intval and then accessing .ulongval resulting
in 0xaaaaaaaa00000000 as a possible result should be no
surprise for C as far as I can tell. Based on what I'm
reading, code that assumes otherwise is incorrect relative
to the C language standard.

It looks to me like the bzero use is trying to make changes
to allow non-standard C code "still work".

===
Mark Millard
marklmi at yahoo.com