Re: aarch64(?) poudiere-devel based builds seem to get fairly-rare corrupted files after recent system update(s?)

From: Mark Millard via freebsd-current <freebsd-current_at_freebsd.org>
Date: Wed, 17 Nov 2021 19:17:27 UTC
On 2021-Nov-15, at 15:43, Mark Millard <marklmi@yahoo.com> wrote:

> On 2021-Nov-15, at 13:13, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> On 2021-Nov-15, at 12:51, Mark Millard <marklmi@yahoo.com> wrote:
>> 
>>> On 2021-Nov-15, at 11:31, Mark Millard <marklmi@yahoo.com> wrote:
>>> 
>>>> I updated from (shown a system that I've not updated yet):
>>>> 
>>>> # uname -apKU
>>>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #18 main-n250455-890cae197737-dirty: Thu Nov  4 13:43:17 PDT 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 
>>>> 1400040 1400040
>>>> 
>>>> to:
>>>> 
>>>> # uname -apKU
>>>> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #19 main-n250667-20aa359773be-dirty: Sun Nov 14 02:57:32 PST 2021     root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72  arm64 aarch64 1400042 1400042
>>>> 
>>>> and then updated /usr/ports/ and started poudriere-devel based builds of
>>>> the ports I's set up to use. However my last round of port builds from
>>>> a general update of /usr/ports/ were on 2021-10-23 before either of the
>>>> above.
>>>> 
>>>> I've had at least two files that seem to be corrupted, where a later part
>>>> of the build hits problematical file(s) from earlier build activity. For
>>>> example:
>>>> 
>>>> /usr/local/include/X11/extensions/XvMC.h:1:1: warning: null character ignored [-Wnull-character]
>>>> <U+0000> 
>>>> ^
>>>> /usr/local/include/X11/extensions/XvMC.h:1:2: warning: null character ignored [-Wnull-character]
>>>> <U+0000><U+0000>
>>>>     ^
>>>> /usr/local/include/X11/extensions/XvMC.h:1:3: warning: null character ignored [-Wnull-character]
>>>> <U+0000><U+0000><U+0000> 
>>>>             ^   
>>>> /usr/local/include/X11/extensions/XvMC.h:1:4: warning: null character ignored [-Wnull-character]
>>>> <U+0000><U+0000><U+0000><U+0000>
>>>>                     ^
>>>> . . .
>>>> 
>>>> Removing the xorgproto-2021.4 package and rebuilding via
>>>> poudiere-devel did not get a failure of any ports dependent
>>>> on it.
>>>> 
>>>> This was from a use of:
>>>> 
>>>> # poudriere jail -j13_0R-CA7 -i
>>>> Jail name:         13_0R-CA7
>>>> Jail version:      13.0-RELEASE-p5
>>>> Jail arch:         arm.armv7
>>>> Jail method:       null
>>>> Jail mount:        /usr/obj/DESTDIRs/13_0R-CA7-poud
>>>> Jail fs:           
>>>> Jail updated:      2021-11-04 01:48:49
>>>> Jail pkgbase:      disabled
>>>> 
>>>> but another not-investigated example was from:
>>>> 
>>>> # poudriere jail -j13_0R-CA72 -i
>>>> Jail name:         13_0R-CA72
>>>> Jail version:      13.0-RELEASE-p5
>>>> Jail arch:         arm64.aarch64
>>>> Jail method:       null
>>>> Jail mount:        /usr/obj/DESTDIRs/13_0R-CA72-poud
>>>> Jail fs:           
>>>> Jail updated:      2021-11-04 01:48:01
>>>> Jail pkgbase:      disabled
>>>> 
>>>> (so no 32-bit COMPAT involved). The apparent corruption
>>>> was in a different port (autoconfig, noticed by the
>>>> build of automake failing via config reporting
>>>> /usr/local/share/autoconf-2.69/autoconf/autoconf.m4f
>>>> being rejected).
>>>> 
>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/ and
>>>> /usr/obj/DESTDIRs/13_0R-CA72-poud/ and the like track the
>>>> system versions.
>>>> 
>>>> The media is an Optane 960 in the PCIe slot of a HoneyComb
>>>> (16 Cortex-A72's). The context is a root on ZFS one, ZFS
>>>> used in order to have bectl, not redundancy.
>>>> 
>>>> The ThreadRipper 1950X (so amd64) port builds did not give
>>>> evidence of such problems based on the updated system. (Also
>>>> Optane media in a PCIe slot, also root on ZFS.) But the
>>>> errors seem rare enough to not be able to conclude much.
>>> 
>>> For aarch64 targeting aarch64 there was also this
>>> explicit corruption notice during the poudriere(-devel)
>>> bulk build:
>>> 
>>> . . .
>>> [CA72_ZFS] Extracting arm-none-eabi-gcc-8.4.0_3: .........
>>> pkg-static: Fail to extract /usr/local/libexec/gcc/arm-none-eabi/8.4.0/lto1 from package: Lzma library error: Corrupted input data
>>> [CA72_ZFS] Extracting arm-none-eabi-gcc-8.4.0_3... done
>>> 
>>> Failed to install the following 1 package(s): /packages/All/arm-none-eabi-gcc-8.4.0_3.pkg
>>> *** Error code 1
>>> Stop.
>>> make: stopped in /usr/ports/sysutils/u-boot-orangepi-plus-2e
>>> 
>>> I'm not yet to the point of retrying after removing
>>> arm-none-eabi-gcc-8.4.0_3 : other things are being built.
>> 
>> 
>> Another context with my prior general update of /usr/ports/
>> and the matching port builds: Back then I used USE_TMPFS=all
>> but the failure is based on USE_TMPFS-"data" instead. So:
>> lots more I/O.
>> 
> 
> None of the 3 corruptions repeated during bulk builds that
> retried the builds that generated the files. All of the
> ports that failed by hitting the corruptions in what they
> depended on, built fine in teh retries.
> 
> For reference:
> 
> I'll note that, back when I was using USE_TMPFS=all , I also
> did some separate bulk -a test runs, both aarch64 (Cortex-A72)
> native and Cortext-A72 targeting Cortex-A7 (armv7). None of
> those showed evidence of file corruptions. In general I've
> not had previous file corruptions with this system. (There
> was a little more than 245 GiBytes swap, which covered the
> tmpfs needs when they were large.)


I set up a contrasting test context and got no evidence of
corruptions in that context. (Note: the 3 bulk builds
total to around 24 hrs of activity for the 3 examples
of 460+ ports building.) So, for the Cortex-A72 system,

root on UFS on portable USB3 SSD:   no evidence of corruptions
vs.:
root on ZFS on optane in PCIe slot: solid evidence of 3 known corruptions

Both had USE_TMPFS="data" in use. The same system build
had been installed and booted for both tests.

The evidence of corruptions is rare enough for this not to
be determinative, but it is suggestive.

Unfortunately, ZFS vs. UFS and Optane-in-PCIe vs. USB3 are
not differentiated by this test result.

There is also the result that I've not seen evidence of
corruptions on the ThreadRipper 1950 X (amd64) system.
Again, not determinative, but suggestive, given how rare
the corruptions seem to be.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)