Re: git: 060699e91369 - stable/13 - Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6
Date: Mon, 01 May 2023 22:11:40 UTC
On 1 May 2023, at 21:33, Jason A. Harmening <jah@FreeBSD.org> wrote: > > On Mon, May 01, 2023 at 08:41:32PM +0200, Dimitry Andric wrote: >> On 1 May 2023, at 18:14, John Baldwin <jhb@freebsd.org> wrote: >>> >>> On 4/30/23 8:31 PM, Jason A. Harmening wrote: >>>> On Sun, Apr 30, 2023 at 07:34:45PM -0500, Jason A. Harmening wrote: >>>>> On Sun, Apr 30, 2023 at 06:47:13PM -0500, Jason A. Harmening wrote: >>>>>> On Sun, Apr 30, 2023 at 08:09:16AM +0300, Konstantin Belousov wrote: >>>>>>> On Sat, Apr 29, 2023 at 02:27:50PM -0500, Jason A. Harmening wrote: >>>>>>>> On Sat, Apr 29, 2023 at 08:49:28PM +0200, Dimitry Andric wrote: >>>>>>>>> On 29 Apr 2023, at 20:33, Jason A. Harmening <jah@FreeBSD.org> wrote: >>>>>>>>>> >>>>>>>>>> On Sun, Apr 09, 2023 at 09:35:22PM +0000, Dimitry Andric wrote: >>>>>>>>>>> The branch stable/13 has been updated by dim: >>>>>>>>>>> >>>>>>>>>>> URL: https://cgit.FreeBSD.org/src/commit/?id=060699e9136975d51d3f726b9785bdbac9a62ba6 >>>>>>>>>>> >>>>>>>>>>> commit 060699e9136975d51d3f726b9785bdbac9a62ba6 >>>>>>>>>>> Author: Dimitry Andric <dim@FreeBSD.org> >>>>>>>>>>> AuthorDate: 2023-01-14 16:33:24 +0000 >>>>>>>>>>> Commit: Dimitry Andric <dim@FreeBSD.org> >>>>>>>>>>> CommitDate: 2023-04-09 14:54:52 +0000 >>>>>>>>>>> >>>>>>>>>>> Merge llvm-project release/15.x llvmorg-15.0.7-0-g8dfdcc7b7bf6 >>>>>>>>>>> >>>>>>>>>>> This updates llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and >>>>>>>>>>> openmp to llvmorg-15.0.7-0-g8dfdcc7b7bf6. >>>>>>>>>>> >>>>>>>>>>> PR: 265425 >>>>>>>>>>> MFC after: 2 weeks >>>>>>>>>> >>>>>>>>>> This MFC of llvm15 appears to have completely broken the Intel IOMMU >>>>>>>>>> driver on my stable/13 machine. After this series of commits, any >>>>>>>>>> downstream DMA seems to produce an IOMMU translation fault, which >>>>>>>>>> renders the machine completely unusable: no nvme boot disk, no usb >>>>>>>>>> keyboard, etc. >>>>>>>>>> >>>>>>>>>> The faults I see look something like this: >>>>>>>>>> >>>>>>>>>> DMAR4: ahci0: pci0:17:5 sid 8d fault acc 0 adt 0x0 reason 0x3 addr 26000 >>>>>>>>>> >>>>>>>>>> It's a bit surprising to see a toolchain upgrade produce breakage like >>>>>>>>>> this, but that's what git bisect clearly tells me. I wonder if some of >>>>>>>>>> the IOMMU control structures might be defined as C bitfields and the new >>>>>>>>>> compiler is emitting them differently? Also, was any breakage like this >>>>>>>>>> observed when -current was upgraded to llvm15 several months ago? >>>>>>>>> >>>>>>>>> I haven't heard anything about such breakage, no. >>>>>>>>> >>>>>>>>> >>>>>>>>>> More generally, this is the second time in as many months I've had to >>>>>>>>>> deal with IOMMU breakage on -stable. I can't imagine I'm the only >>>>>>>>>> person who sees value in running with DMA remapping enabled; do we need >>>>>>>>>> a dedicated DMAR-enabled machine in the cluster to smoke-test changes >>>>>>>>>> like this? More generally, should we avoid MFCing high-risk changes >>>>>>>>>> like this? >>>>>>>>> >>>>>>>>> Since there were very few bug reports, it was not deemed high risk. >>>>>>>>> >>>>>>>>> In any case, it would be good to get the bottom of what is causing the >>>>>>>>> problem, so is there any way you can isolate which code seems to be >>>>>>>>> going "bad"? >>>>>>>>> >>>>>>>>> For example, if this problem affects code in sys/dev/iommu, is there >>>>>>>>> some way you can compile that part with -O1, or with an older version >>>>>>>>> of clang (from ports), to see if the problem goes away? >>>>>>>> >>>>>>>> I did try removing all custom make.conf settings (previously I just had >>>>>>>> CPUTYPE?=icelake-server), but that didn't change the behavior. >>>>>>>> >>>>>>>> Before I try further build tweaks, I'd like to ask if the IOMMU fault >>>>>>>> report can provide guidance here? AFAICT all the faults I'm getting >>>>>>>> show "reason 0x3". If I'm reading the VT-d spec correctly, FR=0x3 >>>>>>>> indicates an invalid context entry, in other words there's something the >>>>>>>> hardware doesn't like in the way the address width or pagetable base is >>>>>>>> configured for the PCIe requestor. >>>>>>> >>>>>>> I would start looking at the other direction: might be, there are still some >>>>>>> left shifts for int32 values with the shift count > 30, or uint32 with the >>>>>>> count > 31. >>>>>>> >>>>>>> Also might be useful to dump each context entry on creation, it is kept >>>>>>> constant after. >>>>>> >>>>>> I did look over the constants in intel_reg.h, and didn't see anything >>>>>> that looked as though it would be susceptible to sign-extension or >>>>>> truncation bugs. In the failing case it's much easier for me to catch >>>>>> the fault messages than any initialization message, so I instrumented >>>>>> the fault handler to get the context entry from the dmar_ctx object >>>>>> using the same logic as dmar_map_ctx_entry(), and then dump out the ctx1 >>>>>> and ctx2 fields. What I see are messages like: >>>>>> >>>>>> ... ctx1 0x10013b001 ctx2 0x103 >>>>>> >>>>>> At first glance these "look right": the P bit is set in ctx1, and the >>>>>> rest of the field looks like a valid physical address. ctx2 also >>>>>> doesn't have any of the reserved bits set, but in all cases it does have >>>>>> AW=3, which would indicate 57-bit AGAW. But when I boot the last >>>>>> working kernel, from the revision prior to the llvm15 MFC, I see this in >>>>>> dmesg: >>>>>> >>>>>> ahci0: dmar4 pci0:0:17:5 rid 8d domain 1 mgaw 48 agaw 48 re-mapped >>>>>> >>>>>> ...all reported devices show 48-bit MGAW/AGAW, so I would expect ctx2 to >>>>>> have AW=2. I suspect this may be the source of the fault, but I'm not >>>>>> sure how it's getting configured that way, whether it's an issue with >>>>>> reading the capability register or something else. >>>>>> >>>>> >>>>> I can confirm that hacking domain_set_agaw() to always use the settings >>>>> from sagaw_bits[2] eliminates the faults and at least allows the machine >>>>> to boot to single-user mode. >>>> I see what's happening now. When I added the hack to always set >>>> sagaw_bits[2], I noted that the passed-in MGAW was still 57, while >>>> unit->hw_cap had the correct value of 0x4 (=> 4-level paging, 48-bit AW) >>>> in bits 12:8. The problem is that sagaw_bits has agaw=64 in its last >>>> entry. This results in dmar_maxaddr2mgaw() attempting a comparison >>>> against 1ULL << 64 in the final iteration of its first loop. I suspect >>>> the new compiler probably determines that last iteration is meaningless >>>> and simply omits it from the (probably unrolled) loop. Since the "loop" >>>> terminates with i < nitems(sagaw_bits), the subsequent "allow_less ..." >>>> case doesn't execute and we end up erroneously selecting a 57-bit >>>> address width. Just commenting out that last entry in sagaw_bits fixes >>>> the problem. >>>> So, two questions: >>>> 1) Does any VT-d hardware actually support 6-level paging? The ca. 2021 >>>> VT-d spec I'm looking at indicates 5-level is the greatest depth >>>> supported, with everything above that being reserved. >>>> 2) I'd expect clang to try very hard to error out in a situation like >>>> this, but I see that sys/conf/kern.mk sets -Wno-shift-count-overflow >>>> among other things, and more of them were added for clang 15. This >>>> seems like a really bad idea, regardless of how much of a PITA it may be >>>> to fix these warnings. >>> >>> FWIW, I've been working on trying to re-enable some of the warnings that >>> were disabled for clang 15 in main. I'll move that one higher up on my >>> todo list. >> >> In this particular case, it doesn't warn about it though. I think KASAN >> might be a better 'catcher' for this kind of error, or a KUBSAN, if we >> had one... > > If you've tried turning all the relevant warnings/errors back on for > this code, and clang truly doesn't warn about it, then wouldn't this > warrant a bug against upstream clang? This is a situation in which the > compiler detects a left-shift overflow, by itself at least a warning > condition, and uses it to materially alter program flow. You could try, but it will probably be classified as undefined behavior. I guess the warning will only show up in a more full-blown analyzer. I haven't tried that. -Dimitry