Re: 15-aarch64-RPI-snap

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 29 Oct 2023 01:25:18 UTC
On Oct 28, 2023, at 09:40, Mark Millard <marklmi@yahoo.com> wrote:

> On Oct 27, 2023, at 23:00, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On Oct 27, 2023, at 22:24, Mark Millard <marklmi@yahoo.com> wrote:
>> 
>>> On Oct 27, 2023, at 21:34, Glen Barber <gjb@FreeBSD.org> wrote:
>>> 
>>>>>> . . .
>>>>>>                                                                                                                                     ^
>>>>>> ./offset.inc:16:19: error: null character ignored [-Werror,-Wnull-character]
>>>>>> <U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+0000><U+00
>>>>>> 00><U+0000>#undef _SA
>>>>>>                                                                                                                                             ^
>>> 
>>> Are the above from a ZFS file system? UFS? Something else?
>>> 
>>> Back in 2021-Nov (15..21) I had problems where ZFS was leading
>>> to blocks of such on aarch64, not specifically RPi*'s, various
>>> files but not the same ones from test to test. When I updated
>>> past some zfs updates on the 23rd the problem stopped.
>>> 
>>> I also have notes from 2022-Mar (19..22) about replicating
>>> another example problem someone was having with files ending
>>> up with such blocks of bytes but the testing was on the
>>> ThreadRipper 1950X. (The replication showed that ccache did
>>> not need to be involved since I've never used it.) Again
>>> ZFS was part of the environment that got the replication.
>>> Mark Johnson fixed sys/contrib/openzfs/module/zfs/dnode.c
>>> during this and my ability to replicate the issue then
>>> stopped when I tested the patch.
>>> 
>>> Which ever file system it is that holds the bad bytes, some
>>> attempted testing for repeatability of the problem could
>>> be of interest, some of that being on aarch64 but not on
>>> RPi*'s, some of it not on aarch64 at all. But it might take
>>> information about the context to know better what/how to
>>> test. That could include information about both the host and
>>> the jail OS versions if such is involved.
>> 
>> Those last notes are likely too generic, in that normally
>> official buildworld buildkernel activity is done on amd64
>> for all target platforms (last I knew). (Not that running
>> such builds on other platforms would be a bad problem-scope
>> isolation test.)
>> 
>> Any notes that help delimit what sort of test context
>> would be a reasonable partial replication of the original
>> context could prove useful.
>> 
>>> . . .
> 
> If the file system is ZFS, I'll note that main [so: 15] already has
> a zpool feature that is not part of openzfs-2.2 and so not part of
> releng/14.0 or stable/14 . So what zpool features are enabled could
> be relevant to problems that only happen in main and might need to
> be involved in efforts to replicate the problem.
> 
> But I've not evaluated if redaction_list_spill would be likely to
> possibly be involved for the specific type of file corruptions.

I'll note that the upstream openzfs master commit for the data
corruption issue:

"Zpool can start allocating from metaslab before TRIMs have completed"

was on 2023-Oct-12, so not long ago. If the official builds use ZFS
and TRIM but are based on a system version that predates FreeBSD picking
up that commit, then there is a known data zfs data corruption issue
present in the official build environment.

Since port->package builds are based on a HOST/JAIL such as:

Host OSVERSION: 1500000
Jail OSVERSION: 1500002
or:
Host OSVERSION: 1500000
Jail OSVERSION: 1400097

but the Host kernel is the one in use (with the Host kernel
commit not identified), it could have such an issue.

(Because of such issues, I wish that Host OSVERSION related
commit identification was also reported for the package builds.
Presuming ZFS use, I also wish that the zpool features enabled
were reported for similar reasons.)


===
Mark Millard
marklmi at yahoo.com