From nobody Tue Nov 23 08:43:11 2021 X-Original-To: arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 5D7D318A1564 for ; Tue, 23 Nov 2021 08:43:23 +0000 (UTC) (envelope-from marklmi@yahoo.com) Received: from sonic311-23.consmr.mail.gq1.yahoo.com (sonic311-23.consmr.mail.gq1.yahoo.com [98.137.65.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4HyyMt2lCHz4kPb for ; Tue, 23 Nov 2021 08:43:22 +0000 (UTC) (envelope-from marklmi@yahoo.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637656994; bh=Pr6rR2lEIfwENCla7H0CNaTIcvKHU82y4b/a7y0tciI=; h=From:Subject:Date:References:To:In-Reply-To:From:Subject:Reply-To; b=tKaxyNgTyzkWAKTo05lLISYom1SvAEV+Ky18NnxNgA8MqhIZseaiU5sTpmN8cy1pYjTLHXpFbQxM+MHiXogPmh0agnbnMB5Nbia+GhDB/UixM029Fbx/tkgRE8X7Sp8z0FI+Vrobgynx+VnvScUz7mLSsMyv8QxkYS46nnEr8chuaILLRy7wlDizlYJpQh933YDGTt2um+GmsOzFE8piLNtBVgT3aKb2kyxYJ+uBRp0R8qZwhGjuvv5LvQznvGQoxr+0fXpmc/n1BSw2kdfHVzJCPXAmT3HcpIxpfscpPmto8P7zBzcG0wFXiovmN1TBQLlqrX3xeSnRHJ6Ygbhr+w== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1637656994; bh=B7BH871VGILXRG1SSveUIOq/fa4OnrzJnS7SDhRwq5Q=; h=X-Sonic-MF:From:Subject:Date:To:From:Subject; b=ooqMCcTmDXcbxIF7lwsIeZ/AMBhArZAFrYrK9WHSDuzsJxTGJb3exZ9f8hmfRlKJwN7OFXOj+1LxrXnIOThIrQRx09AIp8IDbkX757A2vMRiP2b+i7wFS/YxIDJu/TDrGYXWFoW2+cTaXnqA4PiV4w4fJlx7u7WVLOSwF+AlDhShddaZilY+B9R4JK92DuRH5pcVwZnd9uFw02cHCuj+cjTs989SKdKLL5zD0T1pKM1V1/sQRx9COc0gYXY8YXruZN3youuZW5w3s1+YM4eLO8yYPW1nj4FJoGQvcucErnlmTWGocPIUGv798Rl1HEeEqy29D/kL9FKopr7NZNdHMA== X-YMail-OSG: LgrVpEEVM1lfa8sU9CCUUaZr7z3Q1zd3osOMZLuq8X_ZlGcLgBOI_Bau7fgGYHD NvQ4ImAerwFhD5Ozrb535bG90TOBMF17IqEcIWmS1lMT3201l5LIHW3XzPqxjs8NPSnx1kIgrHMr Xww0vC0Yn.HpdqULhks0niq9VbBw2EJzPOFxqhb1Cp7DaFc7Fd5mOXPpaJqekxtJ_qEbmtyNbFHo WdQ7SDGVcIsWm6nnok55puMWceBVELVSrtqtDiG2sNQeUWGIwJqqrChjmImWTCZkVOxoSrs_ZB.W hk0lRNXB4.ZoAZziMfGwdtxI5f0.VOXrVBzm0SAbevviL04P6488hAIma8Pkmt70WIeKK6N7iSRl PJXqa2pcprqfUvQeP5p0oXowc70618sYXVO95vOO_aInUzgt12uvbDnYuVPU67DMAhLLgp7RWern o34DVRco6VprsibbHDTB1GWJoqiunVStU64aTYCAtIAxZsUOQ8sGGgH2jTi3HNaDodf23JcTjabJ J41XAwtuqPyqTwPqGad_Mlbt5maV2_9Vy2LxgtoAliHNBliAwSycxNr7dN9nQYB5LWdMJ802kMMD KbNuPpJ6e6bNuGLQv43ReAmyoyHaKY04JcVCR0Q_8T11ivW4.Rp7I3L9TV7CT5Lmm5hQVnZcbtbx 98Kd3OLa8tzUTwOf477z33xCIIbqmCl37fQioBi.6kUOgBq8u5XE1gPYfLh7JAs4aP6HvKKfL.rh mQQaXOLIBGp9ltARkcKG.NquqgpPxC8cFbzJPQ0yV8a4wVsqhlf_CPoSEH7tvpZ2L0C12MsaWF4v s1LtoCK9IabjcheYGFkBuFCXViWD684ddPTcBZDL3ydQTI_cBdvgLvlMJbCXPlcEE2owexMtrj0U c_qfyBnWVr7PxbI9PZJOnDgVq9.0RYrIgZX8bhSNivyBRPchvl4rALgruVl0ki2wbDVaT6ZDTOMH NZrE38lbQyi1_rxn_skdc6eiQ_WPEQz9Yvy2SB8VuUFcumRdhWhoTEZvO4i3VfxUOqKiKhOrWFCX cv_bTdnUaonw8S0eIwg8L59LFo.cWg3WO5WM7oF7S7p4D.01xkpoETmTo3gI4GVDZ.tKK2pKR29t t3Nkb1yDyfM1V9Ag8jPgKjywAKbenyl2a_efMQIe7FoV3Fk0mqgA6hH.sAP4PVhXvxN7XF_BrwbY q17X0OVqBRBUji_J0.m2aGA9rRS1JTJbr.TAPG9eRSjBxshOWCQoXC.A0_qXycYXaiNXZmlbcEZM vvXNM.aCuWnrJrPuskFnC.MHEts20.st4eV2O4Obqh8.Z8PdVjrdmia_NPyw9SXvVcJNOCHuRRsc 6JgHFhUE_C7cqd8t.mij80xXnwMAbaxor1X0i.ZFghlHDg5zgvATOteyIs0kRp_xCyejwKeUnG7U B_BT7hk1b0nG7jbRy1SJnN269Bn7UjEQ_urA3AtjPfTkYXbDNrTthBF.tXtuzzYnQ96j.nDTG2nr ksS.E5IPHdTu5PtPuA6ofwJUoHY4oqOG1rP_72yaiCzhXQZ8B21NHgDI_SkBDjwMJLz.PuLyHNEk skRADvS9H3NtQYqQ2qFoR15EB3_sF9bhoiJblIZ3a10dSC3PsX6W6IvPJfOAopf0vdFCRgSTAWHu 6ogK3rowlVQyYqbWisSzmN0OhpppHobH9Ykc7r8op4jZU3dcFuIMBYM1210jLbMX6TmrHxcxWLOD clcy3_33OS8Qho7Z0x3boN9yq0G5YrBl1JEROUh6h7PGYogpgj99f9Uzn8jgp9v5dIQF1KQyCcba tCXEhAzhf0373npJ.8nUC11wcdz0n99teac05U6bClTVLCZNJ2TJg2wvUMxqonTrh_1uZpFIpPTs GA6QzBhpr3OZCzVTStBpGqoFt5I9bCajBTA7lAA18Plw.fJtvlmPtS50YV.NKviqH9OsxYA77qzC cFmvD.7LC166ZgYzwxYi9W79iPlAgmVlUvqFexKxQA8ICgaxuhKAcOvTZ_CKV.A3Qhjtfkkt_mvA IE9T6.vHAtV2rqBX9KXOvZkhporPV5oDjJ6PTgAI_jiVhYxQgLKCldIivuCmWdKL2j7pCA.VHIs4 i34HKo7e9v6_G_.Kx3q1gDwmDryMH4a_KHmmsk2uNqFgXIsSN9W87IWRBArWccptMvLGGdQdiH2E BmiMHGyl5_6xuL2Fqiyz1_sAUY0W2g9NEjpw5N3B0K09QvIX_FniY6EZx.uD0MCUJrKORuC4o360 5r8pgiAYbAKpr91TliGudAffPkfb88hboe__fJP0mZ2rI5g-- X-Sonic-MF: Received: from sonic.gate.mail.ne1.yahoo.com by sonic311.consmr.mail.gq1.yahoo.com with HTTP; Tue, 23 Nov 2021 08:43:14 +0000 Received: by kubenode545.mail-prod1.omega.gq1.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 3d7e4f650e22930b114022d53702e6ba; Tue, 23 Nov 2021 08:43:12 +0000 (UTC) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Subject: Re: aarch64(?) poudiere-devel based builds seem to get fairly-rare corrupted files after recent system update(s?) Date: Tue, 23 Nov 2021 00:43:11 -0800 References: <2CA61249-321C-45AA-9755-597146AB8E9F@yahoo.com> <65AA4BCD-EC4B-4A19-B750-C7FC6E5ADDF5@yahoo.com> <9BF4F65B-6437-4D88-AF34-9BCFBF90D6F3@yahoo.com> <4B591638-4693-4403-8549-88D7A1D9D669@yahoo.com> <0006EB30-B9F9-465A-8B9A-A0C03899CEFC@yahoo.com> To: freebsd-current , "freebsd-arm@freebsd.org" In-Reply-To: Message-Id: <52F86CFA-7189-4AB6-BFB8-BFAB7EDBAFC0@yahoo.com> X-Mailer: Apple Mail (2.3654.120.0.1.13) X-Rspamd-Queue-Id: 4HyyMt2lCHz4kPb X-Spamd-Bar: --- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=yahoo.com header.s=s2048 header.b=tKaxyNgT; dmarc=pass (policy=reject) header.from=yahoo.com; spf=pass (mx1.freebsd.org: domain of marklmi@yahoo.com designates 98.137.65.204 as permitted sender) smtp.mailfrom=marklmi@yahoo.com X-Spamd-Result: default: False [-3.50 / 15.00]; FREEMAIL_FROM(0.00)[yahoo.com]; MV_CASE(0.50)[]; R_SPF_ALLOW(-0.20)[+ptr:yahoo.com]; TO_DN_ALL(0.00)[]; DKIM_TRACE(0.00)[yahoo.com:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[yahoo.com,reject]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; RCVD_TLS_LAST(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[yahoo.com]; ASN(0.00)[asn:36647, ipnet:98.137.64.0/20, country:US]; MID_RHS_MATCH_FROM(0.00)[]; SUBJECT_HAS_QUESTION(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[yahoo.com:s=s2048]; FROM_HAS_DN(0.00)[]; DWL_DNSWL_NONE(0.00)[yahoo.com:dkim]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[98.137.65.204:from]; RWL_MAILSPIKE_POSSIBLE(0.00)[98.137.65.204:from]; RCVD_COUNT_TWO(0.00)[2] Reply-To: marklmi@yahoo.com From: Mark Millard via freebsd-current X-Original-From: Mark Millard X-ThisMailContainsUnwantedMimeParts: N On 2021-Nov-21, at 07:50, Mark Millard wrote: > On 2021-Nov-20, at 11:54, Mark Millard wrote: >=20 >> On 2021-Nov-19, at 22:20, Mark Millard wrote: >>=20 >>> On 2021-Nov-18, at 12:15, Mark Millard wrote: >>>=20 >>>> On 2021-Nov-17, at 11:17, Mark Millard wrote: >>>>=20 >>>>> On 2021-Nov-15, at 15:43, Mark Millard wrote: >>>>>=20 >>>>>> On 2021-Nov-15, at 13:13, Mark Millard wrote: >>>>>>=20 >>>>>>> On 2021-Nov-15, at 12:51, Mark Millard = wrote: >>>>>>>=20 >>>>>>>> On 2021-Nov-15, at 11:31, Mark Millard = wrote: >>>>>>>>=20 >>>>>>>>> I updated from (shown a system that I've not updated yet): >>>>>>>>>=20 >>>>>>>>> # uname -apKU >>>>>>>>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #18 = main-n250455-890cae197737-dirty: Thu Nov 4 13:43:17 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64=20 >>>>>>>>> 1400040 1400040 >>>>>>>>>=20 >>>>>>>>> to: >>>>>>>>>=20 >>>>>>>>> # uname -apKU >>>>>>>>> FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #19 = main-n250667-20aa359773be-dirty: Sun Nov 14 02:57:32 PST 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400042 1400042 >>>>>>>>>=20 >>>>>>>>> and then updated /usr/ports/ and started poudriere-devel based = builds of >>>>>>>>> the ports I's set up to use. However my last round of port = builds from >>>>>>>>> a general update of /usr/ports/ were on 2021-10-23 before = either of the >>>>>>>>> above. >>>>>>>>>=20 >>>>>>>>> I've had at least two files that seem to be corrupted, where a = later part >>>>>>>>> of the build hits problematical file(s) from earlier build = activity. For >>>>>>>>> example: >>>>>>>>>=20 >>>>>>>>> /usr/local/include/X11/extensions/XvMC.h:1:1: warning: null = character ignored [-Wnull-character] >>>>>>>>> =20 >>>>>>>>> ^ >>>>>>>>> /usr/local/include/X11/extensions/XvMC.h:1:2: warning: null = character ignored [-Wnull-character] >>>>>>>>> >>>>>>>>> ^ >>>>>>>>> /usr/local/include/X11/extensions/XvMC.h:1:3: warning: null = character ignored [-Wnull-character] >>>>>>>>> =20 >>>>>>>>> ^ =20 >>>>>>>>> /usr/local/include/X11/extensions/XvMC.h:1:4: warning: null = character ignored [-Wnull-character] >>>>>>>>> >>>>>>>>> ^ >>>>>>>>> . . . >>>>>>>>>=20 >>>>>>>>> Removing the xorgproto-2021.4 package and rebuilding via >>>>>>>>> poudiere-devel did not get a failure of any ports dependent >>>>>>>>> on it. >>>>>>>>>=20 >>>>>>>>> This was from a use of: >>>>>>>>>=20 >>>>>>>>> # poudriere jail -j13_0R-CA7 -i >>>>>>>>> Jail name: 13_0R-CA7 >>>>>>>>> Jail version: 13.0-RELEASE-p5 >>>>>>>>> Jail arch: arm.armv7 >>>>>>>>> Jail method: null >>>>>>>>> Jail mount: /usr/obj/DESTDIRs/13_0R-CA7-poud >>>>>>>>> Jail fs: =20 >>>>>>>>> Jail updated: 2021-11-04 01:48:49 >>>>>>>>> Jail pkgbase: disabled >>>>>>>>>=20 >>>>>>>>> but another not-investigated example was from: >>>>>>>>>=20 >>>>>>>>> # poudriere jail -j13_0R-CA72 -i >>>>>>>>> Jail name: 13_0R-CA72 >>>>>>>>> Jail version: 13.0-RELEASE-p5 >>>>>>>>> Jail arch: arm64.aarch64 >>>>>>>>> Jail method: null >>>>>>>>> Jail mount: /usr/obj/DESTDIRs/13_0R-CA72-poud >>>>>>>>> Jail fs: =20 >>>>>>>>> Jail updated: 2021-11-04 01:48:01 >>>>>>>>> Jail pkgbase: disabled >>>>>>>>>=20 >>>>>>>>> (so no 32-bit COMPAT involved). The apparent corruption >>>>>>>>> was in a different port (autoconfig, noticed by the >>>>>>>>> build of automake failing via config reporting >>>>>>>>> /usr/local/share/autoconf-2.69/autoconf/autoconf.m4f >>>>>>>>> being rejected). >>>>>>>>>=20 >>>>>>>>> /usr/obj/DESTDIRs/13_0R-CA7-poud/ and >>>>>>>>> /usr/obj/DESTDIRs/13_0R-CA72-poud/ and the like track the >>>>>>>>> system versions. >>>>>>>>>=20 >>>>>>>>> The media is an Optane 960 in the PCIe slot of a HoneyComb >>>>>>>>> (16 Cortex-A72's). The context is a root on ZFS one, ZFS >>>>>>>>> used in order to have bectl, not redundancy. >>>>>>>>>=20 >>>>>>>>> The ThreadRipper 1950X (so amd64) port builds did not give >>>>>>>>> evidence of such problems based on the updated system. (Also >>>>>>>>> Optane media in a PCIe slot, also root on ZFS.) But the >>>>>>>>> errors seem rare enough to not be able to conclude much. >>>>>>>>=20 >>>>>>>> For aarch64 targeting aarch64 there was also this >>>>>>>> explicit corruption notice during the poudriere(-devel) >>>>>>>> bulk build: >>>>>>>>=20 >>>>>>>> . . . >>>>>>>> [CA72_ZFS] Extracting arm-none-eabi-gcc-8.4.0_3: ......... >>>>>>>> pkg-static: Fail to extract = /usr/local/libexec/gcc/arm-none-eabi/8.4.0/lto1 from package: Lzma = library error: Corrupted input data >>>>>>>> [CA72_ZFS] Extracting arm-none-eabi-gcc-8.4.0_3... done >>>>>>>>=20 >>>>>>>> Failed to install the following 1 package(s): = /packages/All/arm-none-eabi-gcc-8.4.0_3.pkg >>>>>>>> *** Error code 1 >>>>>>>> Stop. >>>>>>>> make: stopped in /usr/ports/sysutils/u-boot-orangepi-plus-2e >>>>>>>>=20 >>>>>>>> I'm not yet to the point of retrying after removing >>>>>>>> arm-none-eabi-gcc-8.4.0_3 : other things are being built. >>>>>>>=20 >>>>>>>=20 >>>>>>> Another context with my prior general update of /usr/ports/ >>>>>>> and the matching port builds: Back then I used USE_TMPFS=3Dall >>>>>>> but the failure is based on USE_TMPFS-"data" instead. So: >>>>>>> lots more I/O. >>>>>>>=20 >>>>>>=20 >>>>>> None of the 3 corruptions repeated during bulk builds that >>>>>> retried the builds that generated the files. All of the >>>>>> ports that failed by hitting the corruptions in what they >>>>>> depended on, built fine in teh retries. >>>>>>=20 >>>>>> For reference: >>>>>>=20 >>>>>> I'll note that, back when I was using USE_TMPFS=3Dall , I also >>>>>> did some separate bulk -a test runs, both aarch64 (Cortex-A72) >>>>>> native and Cortext-A72 targeting Cortex-A7 (armv7). None of >>>>>> those showed evidence of file corruptions. In general I've >>>>>> not had previous file corruptions with this system. (There >>>>>> was a little more than 245 GiBytes swap, which covered the >>>>>> tmpfs needs when they were large.) >>>>>=20 >>>>>=20 >>>>> I set up a contrasting test context and got no evidence of >>>>> corruptions in that context. (Note: the 3 bulk builds >>>>> total to around 24 hrs of activity for the 3 examples >>>>> of 460+ ports building.) So, for the Cortex-A72 system, >>>>=20 >>>> I set up a UFS on Optane (U.2 via M.2 adapter) context and >>>> also got no evidence of corruptions in that context (same >>>> hardware and a copy of the USB3 SSD based system). The >>>> sequence of 3 bulks took somewhat over 18 hrs using the >>>> Optane. >>>>=20 >>>>> root on UFS on portable USB3 SSD: no evidence of corruptions >>>> Also: >>>> root on UFS on Optane U.2 via M.2: no evidence of corruptions >>>>> vs.: >>>>> root on ZFS on optane in PCIe slot: solid evidence of 3 known = corruptions >>>>>=20 >>>>> Both had USE_TMPFS=3D"data" in use. The same system build >>>>> had been installed and booted for both tests. >>>>>=20 >>>>> The evidence of corruptions is rare enough for this not to >>>>> be determinative, but it is suggestive. >>>>>=20 >>>>> Unfortunately, ZFS vs. UFS and Optane-in-PCIe vs. USB3 are >>>>> not differentiated by this test result. >>>>>=20 >>>>> There is also the result that I've not seen evidence of >>>>> corruptions on the ThreadRipper 1950 X (amd64) system. >>>>> Again, not determinative, but suggestive, given how rare >>>>> the corruptions seem to be. >>>>=20 >>>> So far the only things unique to the observed corruptions are: >>>>=20 >>>> root on ZFS context (vs. root on UFS) >>>> and: >>>> Optane in a PCIe slot (but no contrasting ZFS case tested) >>>>=20 >>>> The PCIe slot does not seem to me to be likely to be contributing. >>>> So this seem to be suggestive of a ZFS problem. >>>>=20 >>>> A contributing point might be that the main [so: 14] system was >>>> built via -mcpu=3Dcortex-a72 for execution on a Cortext-A72 system. >>>>=20 >>>> [I previously ran into a USB subsystem mishandling of keeping >>>> things coherent for the week memory ordering in this sort of >>>> context. That issue was fixed. But back then I was lucky enough >>>> to be able to demonstrate fails vs. works by adding an >>>> appropriate instruction to FreeBSD in a few specific places >>>> (more than necessary as it turned out). Someone else determined >>>> where the actual mishandling was that covered all required >>>> places. My generating that much information in this context >>>> seems unlikely.] >>>=20 >>>=20 >>> I started a retry of root-on-ZFS with the Optane-in-PCIe-slot media >>> and it got its first corruption (in a different place, 2nd bulk >>> build this time). The use of the corrupted file reports: >>>=20 >>> configure:13269: cc -o conftest -Wall -Wextra -fsigned-char = -Wdeclaration-after-statement -O2 -pipe -mcpu=3Dcortex-a53 -g = -fstack-protector-strong -fno-strict-aliasing -DUSE_MEMORY_H = -I/usr/local/incl >>> ude -mcpu=3Dcortex-a53 -fstack-protector-strong conftest.c = -L/usr/local/lib -logg >&5 >>> In file included from conftest.c:27: >>> In file included from /usr/local/include/ogg/ogg.h:24: >>> In file included from /usr/local/include/ogg/os_types.h:154: >>> /usr/local/include/ogg/config_types.h:1:1: warning: null character = ignored [-Wnull-character] >>> >>> ^ >>> /usr/local/include/ogg/config_types.h:1:2: warning: null character = ignored [-Wnull-character] >>> >>> ^ >>> /usr/local/include/ogg/config_types.h:1:3: warning: null character = ignored [-Wnull-character] >>> >>> ^ >>> . . . >>> /usr/local/include/ogg/config_types.h:1:538: warning: null character = ignored [-Wnull-character] >>> . . . (nulls) . . . >>>=20 >>> So: 538 such null bytes. >>>=20 >>> Thus, another example of something like a page of nulls being >>> written out when ZFS is in use. >>>=20 >>> audio/gstreamer1-plugins-ogg also failed via referencing the file >>> during its build. >>>=20 >>> (The bulk run is still going and there is one more bulk run to go.) >>>=20 >>=20 >> Well, 528 happened to be the size of config_types.h --and of >> config_types.h from a build that did not get the corruption there. >>=20 >> So looking at the other (later) corruption, which was a bigger file >> (looking via bulk -i and installing what contained the file but >> looking from outside the jail): >>=20 >> # find /usr/local/ -name "libtextstyle.so*" -exec ls -Tld {} \; >> -rwxr-xr-x 1 root wheel 2339104 Nov 20 01:05:05 2021 = /usr/local/poudriere/data/.m/13_0R-CA7-default/ref/usr/local/lib/libtextst= yle.so.0.1.1 >> lrwxr-xr-x 1 root wheel 21 Nov 20 01:05:05 2021 = /usr/local/poudriere/data/.m/13_0R-CA7-default/ref/usr/local/lib/libtextst= yle.so.0 -> libtextstyle.so.0.1.1 >> lrwxr-xr-x 1 root wheel 21 Nov 20 01:05:05 2021 = /usr/local/poudriere/data/.m/13_0R-CA7-default/ref/usr/local/lib/libtextst= yle.so -> libtextstyle.so.0.1.1 >>=20 >> hd = /usr/local/poudriere/data/.m/13_0R-CA7-default/ref/usr/local/lib/libtextst= yle.so.0.1.1 | more >> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 = |................| >> * >> 0023b120 >>=20 >> So the whole, over 2 MiByte, the whole file ended up with just null = Bytes. >>=20 >> To cross check on live system caching vs. on disk, I rebooted and = redid the >> bulk -i based install of libtextstyle and looked at = libtextstyle.so.0.1.1 : >> still all zeros. >>=20 >> For reference, zpool scrub afterward resulted in: >>=20 >> # zpool status >> pool: zopt0 >> state: ONLINE >> scan: scrub repaired 0B in 00:01:49 with 0 errors on Sat Nov 20 = 11:47:31 2021 >> config: >>=20 >> NAME STATE READ WRITE CKSUM >> zopt0 ONLINE 0 0 0 >> nda1p3 ONLINE 0 0 0 >>=20 >> But it is not a ZFS redundancy context: ZFS used just to use bectl . >=20 > Using bectl (on the root-on-ZFS Optane in PCIe slot), > I booted stable/13 : >=20 > # uname -apKU > FreeBSD CA72_16Gp_ZFS 13.0-STABLE FreeBSD 13.0-STABLE #13 = stable/13-n248062-109330155000-dirty: Sat Nov 13 23:55:14 PST 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/13S-CA72-nodbg-clang/usr/13S-src/arm64.= aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1300520 1300520 >=20 > and tried the sequence of 3 bulk runs: >=20 > There was no evidence of corruptions, suggesting that > the Optane in the PCIe slot is not the source of the > problem of having some file(s) end up with all bytes > being null bytes. >=20 > So, overall, ending up with evidence of corruptions > generated during bulk builds seem to be tied to main's > [so: 14's] ZFS implementation in: >=20 > # uname -apKU > FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #18 = main-n250455-890cae197737-dirty: Thu Nov 4 13:43:17 PDT 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64=20 > 1400040 1400040 >=20 > because that is all that is unique to having the > evidence of corruptions. >=20 > Since there have been ZFS updates in main since then, it > seems that the next experiment would be to update main > and try again under main. Given that the issue seems to be a ZFS issue, I updated to: # uname -apKU FreeBSD CA72_16Gp_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #21 = main-n250903-06bd74e1e39c-dirty: Mon Nov 22 04:15:08 PST 2021 = root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6= 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400042 1400042 (which involved updating some ZFS material). I ran the sequence of 3 bulk's again: no evidence of corruptions. For reference: The bulks targeting Cortex-A72 and Cortex-A53 each took somewhat under 10 minutes more than the earlier stable/13 and main [so: 14] builds that otherwise matched (including the Optane used), for bulks that each took somewhat over 6 hr either way. =3D=3D=3D Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)