From nobody Sat Aug 12 17:05:35 2023 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RNRrK69nTz4qHfr for ; Sat, 12 Aug 2023 17:05:53 +0000 (UTC) (envelope-from kob6558@gmail.com) Received: from mail-yb1-xb2b.google.com (mail-yb1-xb2b.google.com [IPv6:2607:f8b0:4864:20::b2b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RNRrK148Mz3Q6H for ; Sat, 12 Aug 2023 17:05:53 +0000 (UTC) (envelope-from kob6558@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=SF4yG2Qb; spf=pass (mx1.freebsd.org: domain of kob6558@gmail.com designates 2607:f8b0:4864:20::b2b as permitted sender) smtp.mailfrom=kob6558@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-yb1-xb2b.google.com with SMTP id 3f1490d57ef6-d62b9bd5b03so2769867276.1 for ; Sat, 12 Aug 2023 10:05:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1691859952; x=1692464752; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+231Dz7sKPM0ZhqrOZIIf369cup096meUcwAQ5ubV3U=; b=SF4yG2QboBxzWCgiLqlSkIYRCQYm/X7eh4ZOiGz/WlkWLFrcDTPEem9JXD4ZL9Wha8 z+xKqULAakPhNVPddXpAJW3Ohw6p0gtEQVACWmwUukXWrMdEjp1vn7Y1zN8loeuknzIn PYhJVTFuNC48ykxvBsOeYIL1t7hN89nqwJGQ2CwAUcyqkyRig1DQ7SLmo80GFebm+Plw vX+UacZC63FoIhP4tIY6HZRsdDmpn1RT/WM8clzcmmOO9adNXoF7QdMqP5pol2zXNGlt gR5K9xixsJU6M4VW+0FMMKTH9D8i6Gr2oYn8obHgjI70x0yV8z1PfSXk/kHn0ZtwQsuh e7zA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691859952; x=1692464752; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+231Dz7sKPM0ZhqrOZIIf369cup096meUcwAQ5ubV3U=; b=gvHYxiN4Y1xfkTFkTWztUMfe8TBzHSNZaWrW6ps2W6MTz2t9YZmq21P9kWNjoVxXdY tSmB++KrC/HVyaGAOlNsq9R9Hr4phJpXxCk0JQ97If9etio4+unc+SrcdEE267vJzXBS or6Y/GfgdpscPtN04hhmwSRzWcnYvgFOCxH7w3NMVzdN9q7tk14PRoY7E5bs4HRGuyVR ABOcdCf/b8FoeUhzvZOTlLZYVuHRWNh/oNQTglubYLF/FcpiOUgraT03YO9Ry6m4pn5L YP0l2FslUvtMpOEw3NUZ8gWcctHWYq1vnMpL3INxEkjGX039uU9qs+Xxls3fm0OSM11U 8eHQ== X-Gm-Message-State: AOJu0Ywj/T6Xf7rosWrlGaRhJUYemSNUpdI66GHoUZoGJYA8lgIn3GdA 60XE6KXgAjHhpixS3eRe2UxTACSZNg7d8U4qcgNLMLGd8So= X-Google-Smtp-Source: AGHT+IF3L22TMdXzlcCmgpSv4jf//MjMB1ormIuG8VEMg+3BkH3m/SRbsmRT+7KBiT5tOBNRA3j/3e61FazdGxC9T1w= X-Received: by 2002:a25:fc21:0:b0:c15:c55d:c26e with SMTP id v33-20020a25fc21000000b00c15c55dc26emr5503785ybd.54.1691859951839; Sat, 12 Aug 2023 10:05:51 -0700 (PDT) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 References: <59cbcfe2-cd53-69d8-65d6-7a79e656f494@FreeBSD.org> <1f968af1-1c57-9a09-7e01-145a5262e27f@FreeBSD.org> <20230806181238.858f58e25dfd0f99269cfe53@dec.sakura.ne.jp> <20230808063735.e8e1d3ede370a18f200a6f48@dec.sakura.ne.jp> <20230808224612.c3889d6e20b6fc980f5278cc@dec.sakura.ne.jp> <20230808235635.744e0e1c6a72face7fdf6a9b@dec.sakura.ne.jp> <4f0fbb44-eebe-aa8f-f958-dcd678936fe1@protected-networks.net> In-Reply-To: <4f0fbb44-eebe-aa8f-f958-dcd678936fe1@protected-networks.net> From: Kevin Oberman Date: Sat, 12 Aug 2023 10:05:35 -0700 Message-ID: Subject: Re: [Intel AlderLake] Read&Write files to FAT32 or UFS partition cause data corrupt due to P-Core&E-Core To: Michael Butler Cc: freebsd-current@freebsd.org Content-Type: multipart/alternative; boundary="000000000000706d570602bcd762" X-Spamd-Result: default: False [-3.61 / 15.00]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.999]; NEURAL_HAM_LONG(-0.91)[-0.913]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; FORGED_SENDER(0.30)[rkoberman@gmail.com,kob6558@gmail.com]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-current@freebsd.org]; ARC_NA(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::b2b:from]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MLMMJ_DEST(0.00)[freebsd-current@freebsd.org]; DKIM_TRACE(0.00)[gmail.com:+]; MID_RHS_MATCH_FROMTLD(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; FROM_NEQ_ENVFROM(0.00)[rkoberman@gmail.com,kob6558@gmail.com]; RCPT_COUNT_TWO(0.00)[2]; MIME_TRACE(0.00)[0:+,1:+,2:~]; RCVD_COUNT_ONE(0.00)[1]; FREEMAIL_ENVFROM(0.00)[gmail.com]; RCVD_TLS_LAST(0.00)[] X-Spamd-Bar: --- X-Rspamd-Queue-Id: 4RNRrK148Mz3Q6H --000000000000706d570602bcd762 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Aug 8, 2023 at 10:50=E2=80=AFAM Michael Butler wrote: > On 8/8/23 10:56, Tomoaki AOKI wrote: > > On Tue, 8 Aug 2023 17:02:32 +0300 > > Konstantin Belousov wrote: > > [ .. snip .. ] > > >> The workaround is switched on automatically, when kernel detects 'smal= l > cores' > >> reported by CPUID. > > > > If I read the code correctly, vm.pmap.pcid_invlpg_workaround > > (precicely, the corresponding variable) is set to non-zero when the > > workaround is enabled. Not sure it was detected correctly at the > > original reporter's environment, but forcibly setting the tunable to 1 > > didn't reported to help sufficiently. > > Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help. > > I'm seeing similar stability problems on an N95-based device. This too > is an Alderlake-N device with only E-cores although I'm running it with > a compilation with CPUTYPE=3Dtremont .. from an older, verbose start-up .= . > > PPIM 0: PA=3D0x4000000000, VA=3D0xffffffff82710000, size=3D0x1d5000, mode= =3D0x1 > pmap: large map 8 PML4 slots (4096 GB) > VT(efifb): resolution 800x600 > Preloaded elf kernel "/boot/kernel.new/kernel" at 0xffffffff8234e000. > Preloaded boot_entropy_cache "/boot/entropy" at 0xffffffff82357d08. > Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at > 0xffffffff82357d60. > Preloaded hostuuid "/etc/hostid" at 0xffffffff82357dc0. > Preloaded TSLOG data "TSLOG" at 0xffffffff82357e10. > CPU: Intel(R) N95 (1689.60-MHz K8-class CPU) > Origin=3D"GenuineIntel" Id=3D0xb06e0 Family=3D0x6 Model=3D0xbe Ste= pping=3D0 > > > Features=3D0xbfebfbff > > > Features2=3D0x7ffafbbf > AMD Features=3D0x2c100800 > AMD Features2=3D0x121 > Structured Extended > > Features=3D0x239ca7eb > Structured Extended > > Features2=3D0x98c007bc > Structured Extended > > Features3=3D0xfc184410 > XSAVE Features=3D0xf > IA32_ARCH_CAPS=3D0x180fd6b > VT-x: Basic Features=3D0x3da0500 > Pin-Based Controls=3D0xff > Primary Processor > > Controls=3D0xfffbfffe > Secondary Processor > > Controls=3D0x75d7fff > Exit Controls=3D0x3da0500 > Entry Controls=3D0x3da0500 > EPT Features=3D0x6f34141 > VPID Features=3D0xf01 > TSC: P-state invariant, performance statistics > 64-Byte prefetching > L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line > real memory =3D 17179869184 (16384 MB) > Physical memory chunk(s): > 0x0000000000010000 - 0x000000000009dfff, 581632 bytes (142 pages) > 0x000000000009f000 - 0x000000000009ffff, 4096 bytes (1 pages) > 0x0000000000100000 - 0x000000005fffffff, 1609564160 bytes (392960 pages) > 0x0000000062401000 - 0x000000007264dfff, 270848000 bytes (66125 pages) > 0x0000000075fff000 - 0x0000000075ffffff, 4096 bytes (1 pages) > 0x0000000100001000 - 0x0000000462497fff, 14533881856 bytes (3548311 pages= ) > 0x000000047fa00000 - 0x000000047fb68fff, 1478656 bytes (361 pages) > avail memory =3D 16363008000 (15604 MB) > CPU microcode: updated from 0xc to 0x10 > MADT: Found CPU APIC ID 0 ACPI ID 0: enabled > SMP: Added CPU 0 (AP) > MADT: Found CPU APIC ID 2 ACPI ID 1: enabled > SMP: Added CPU 2 (AP) > MADT: Found CPU APIC ID 4 ACPI ID 2: enabled > SMP: Added CPU 4 (AP) > MADT: Found CPU APIC ID 6 ACPI ID 3: enabled > SMP: Added CPU 6 (AP) > > On start-up, vm.pmap.pcid_invlpg_workaround=3D1 but seemingly random > faults still occurred under load, for example, 'make buildworld'. > Apparent misreads of source-files resulting in syntax errors were the > most common symptom. Compilation reattempts (mostly) succeed. > > Initially, I put this down to an inadequate power-supply but setting > vm.pmap.pcid_enabled=3D0 seems to have stabilised it. > > I guess there's another dragon in there .. :-( > > Michae > Just to add another report (in the wrong mail list as it is also on a system running 13.2), I have a very similar system from a different manufacturer with the same Alder Lake processor. I will note that the SSD interface is SATA, not nvme. I was getting crashes and corrupt file systems, especially when installing large ports and using rsync to backup the system. I see many, almost identical systems on Amazon that use the same form factor CPU, SSD, RAM, etc, probably all using the same motherboard from a single manufacturer. There are going to be more issues as these boxes are generally <$225 US. (Mine was a bit more expensive to get a VGA connector for my ancient monitor. I had not tried the tuneable, but largely resolved the issue by installing a 250 MB hard drive and putting the system there. In the couple of months since I did this I have had two crashes, both when doing a full backup with rsync. This leads me to think that there is some sort of race triggering this that is minimized by the slow disc speed of spinning rust. I am considering moving the system back to the SSD with vm.pmap.pcid_enabled=3D0. If so, the failure should be very quick as I neve= r could keep the system up long enough to get the system into production. --=20 Kevin Oberman, Part time kid herder and retired Network Engineer E-mail: rkoberman@gmail.com PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683 --000000000000706d570602bcd762 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, Aug 8, 2023 at 10:50=E2= =80=AFAM Michael Butler <i= mb@protected-networks.net> wrote:
On 8/8/23 10:56, Tomo= aki AOKI wrote:
> On Tue, 8 Aug 2023 17:02:32 +0300
> Konstantin Belousov <kostikbel@gmail.com> wrote:

=C2=A0 [ .. snip .. ]

>> The workaround is switched on automatically, when kernel detects &= #39;small cores'
>> reported by CPUID.
>
> If I read the code correctly, vm.pmap.pcid_invlpg_workaround
> (precicely, the corresponding variable) is set to non-zero when the > workaround is enabled. Not sure it was detected correctly at the
> original reporter's environment, but forcibly setting the tunable = to 1
> didn't reported to help sufficiently.
> Currently, only setting tunable vm.pmap.pcid_enabled to 0 could help.<= br>
I'm seeing similar stability problems on an N95-based device. This too =
is an Alderlake-N device with only E-cores although I'm running it with=
a compilation with CPUTYPE=3Dtremont .. from an older, verbose start-up ..<= br>
PPIM 0: PA=3D0x4000000000, VA=3D0xffffffff82710000, size=3D0x1d5000, mode= =3D0x1
pmap: large map 8 PML4 slots (4096 GB)
VT(efifb): resolution 800x600
Preloaded elf kernel "/boot/kernel.new/kernel" at 0xffffffff8234e= 000.
Preloaded boot_entropy_cache "/boot/entropy" at 0xffffffff82357d0= 8.
Preloaded cpu_microcode "/boot/firmware/intel-ucode.bin" at
0xffffffff82357d60.
Preloaded hostuuid "/etc/hostid" at 0xffffffff82357dc0.
Preloaded TSLOG data "TSLOG" at 0xffffffff82357e10.
CPU: Intel(R) N95 (1689.60-MHz K8-class CPU)
=C2=A0 =C2=A0Origin=3D"GenuineIntel"=C2=A0 Id=3D0xb06e0=C2=A0 Fam= ily=3D0x6=C2=A0 Model=3D0xbe=C2=A0 Stepping=3D0

Features=3D0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,P= GE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE><= br>
Features2=3D0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE= 3,SDBG,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AES= NI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
=C2=A0 =C2=A0AMD Features=3D0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM><= br> =C2=A0 =C2=A0AMD Features2=3D0x121<LAHF,ABM,Prefetch>
=C2=A0 =C2=A0Structured Extended
Features=3D0x239ca7eb<FSGSBASE,TSCADJ,BMI1,AVX2,FDPEXC,SMEP,BMI2,ERMS,IN= VPCID,NFPUSG,PQE,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PROCTRACE,SHA>
=C2=A0 =C2=A0Structured Extended
Features2=3D0x98c007bc<UMIP,PKU,OSPKE,WAITPKG,GFNI,VAES,VPCLMULQDQ,RDPID= ,MOVDIRI,MOVDIR64B>
=C2=A0 =C2=A0Structured Extended
Features3=3D0xfc184410<FSRM,MD_CLEAR,IBT,IBPB,STIBP,L1DFL,ARCH_CAP,CORE_= CAP,SSBD>
=C2=A0 =C2=A0XSAVE Features=3D0xf<XSAVEOPT,XSAVEC,XINUSE,XSAVES>
=C2=A0 =C2=A0IA32_ARCH_CAPS=3D0x180fd6b<RDCL_NO,IBRS_ALL,SKIP_L1DFL_VME,= MDS_NO,TAA_NO>
=C2=A0 =C2=A0VT-x: Basic Features=3D0x3da0500<SMM,INS/OUTS,TRUE>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Pin-Based Controls=3D0xff<ExtINT,NMI,V= NMI,PreTmr,PostIntr>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Primary Processor
Controls=3D0xfffbfffe<INTWIN,TSCOff,HLT,INVLPG,MWAIT,RDPMC,RDTSC,CR3-LD,= CR3-ST,CR8-LD,CR8-ST,TPR,NMIWIN,MOV-DR,IO,IOmap,MTF,MSRmap,MONITOR,PAUSE>= ;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Secondary Processor
Controls=3D0x75d7fff<APIC,EPT,DT,RDTSCP,x2APIC,VPID,WBINVD,UG,APIC-reg,V= ID,PAUSE-loop,RDRAND,INVPCID,VMFUNC,VMCS,XSAVES>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Exit Controls=3D0x3da0500<PAT-LD,EFER-= SV,PTMR-SV>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Entry Controls=3D0x3da0500
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0EPT Features=3D0x6f34141<XO,PW4,UC,WB,= 2M,1G,INVEPT,AD,single,all>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VPID Features=3D0xf01<INVVPID,individu= al,single,all,single-globals>
=C2=A0 =C2=A0TSC: P-state invariant, performance statistics
64-Byte prefetching
L2 cache: 2048 kbytes, 16-way associative, 64 bytes/line
real memory=C2=A0 =3D 17179869184 (16384 MB)
Physical memory chunk(s):
0x0000000000010000 - 0x000000000009dfff, 581632 bytes (142 pages)
0x000000000009f000 - 0x000000000009ffff, 4096 bytes (1 pages)
0x0000000000100000 - 0x000000005fffffff, 1609564160 bytes (392960 pages) 0x0000000062401000 - 0x000000007264dfff, 270848000 bytes (66125 pages)
0x0000000075fff000 - 0x0000000075ffffff, 4096 bytes (1 pages)
0x0000000100001000 - 0x0000000462497fff, 14533881856 bytes (3548311 pages)<= br> 0x000000047fa00000 - 0x000000047fb68fff, 1478656 bytes (361 pages)
avail memory =3D 16363008000 (15604 MB)
CPU microcode: updated from 0xc to 0x10
MADT: Found CPU APIC ID 0 ACPI ID 0: enabled
SMP: Added CPU 0 (AP)
MADT: Found CPU APIC ID 2 ACPI ID 1: enabled
SMP: Added CPU 2 (AP)
MADT: Found CPU APIC ID 4 ACPI ID 2: enabled
SMP: Added CPU 4 (AP)
MADT: Found CPU APIC ID 6 ACPI ID 3: enabled
SMP: Added CPU 6 (AP)

On start-up, vm.pmap.pcid_invlpg_workaround=3D1 but seemingly random
faults still occurred under load, for example, 'make buildworld'. <= br> Apparent misreads of source-files resulting in syntax errors were the
most common symptom. Compilation reattempts (mostly) succeed.

Initially, I put this down to an inadequate power-supply but setting
vm.pmap.pcid_enabled=3D0 seems to have stabilised it.

I guess there's another dragon in there .. :-(

=C2=A0 =C2=A0 =C2=A0 =C2=A0 Michae

Just to add another report (= in the wrong mail list as it is also on a system running 13.2), I have a ve= ry similar system from a different manufacturer with the same Alder Lake pr= ocessor. I will note that the SSD interface is SATA, not nvme. I was gettin= g crashes and corrupt file systems, especially when installing large ports = and using rsync to backup the system. I see many, almost identical systems = on Amazon that use the same form factor CPU, SSD, RAM, etc, probably all us= ing the same motherboard from a single manufacturer. There are going to be = more issues as these boxes are generally <$225 US. (Mine was a bit more = expensive to get a VGA connector for my ancient monitor.

I had not tried the tuneable, but largely resolved th= e issue by installing a 250 MB hard drive and putting the system there. In = the couple of months since I did this I have had two crashes, both when doi= ng a full backup with rsync. This leads me to think that there is some sort= of race triggering this that is minimized by the slow disc speed of spinni= ng rust.

I am considering moving the= system back to the SSD with vm.pmap.pcid_enabled=3D0. If so, the failure s= hould be very quick as I never could keep the system up long enough to get = the system into production.
--
Kev= in Oberman, Part time kid herder and retired Network Engineer
E-mail: rkoberman@gmail.com<= /a>
PGP Fingerprint: D03FB98AFA78E3B78C1694B318AB39EF1B055683=
--000000000000706d570602bcd762--