Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic
- Reply: Mark Millard : "Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic"
- In reply to: Mark Millard : "Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 12 Feb 2024 20:00:16 UTC
[Gack: I was looking at the wrong vintage of source code, predating your changes: wrong system used.] On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote: > On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote: > >> On 2/9/24 8:13 PM, Mark Millard wrote: >>> Summary: >>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2 >>> pcib0: parsing FDT for ECAM0: >>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000 >>> . . . >>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff >>> panic: Failed to add resource to rman >> >> Hmmm, I suspect this is due to the way that bus_translate_resource works which is >> fundamentally broken. It rewrites the start address of a resource in-situ instead >> of keeping downstream resources separate from the upstream resources. For example, >> I don't see how you could ever release a resource in this design without completely >> screwing up your rman. That is, I expect trying to detach a PCI device behind a >> translating bridge that uses the current approach should corrupt the allocated >> resource ranges in an rman long before my changes. >> >> That said, that doesn't really explain the panic. Hmm, the panic might be because >> for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource >> hack only kicks in the activate_resource method of pci_host_generic.c. >> >>> Detail: >>> . . . >>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2 >>> pcib0: parsing FDT for ECAM0: >>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000 >> >> This indicates this is a translating bus. >> >>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0 >>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1 >>> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000 >>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pcib1 >>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff> >>> considering [0xc0000000, 0xffffffff] >>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested 0x100000) >>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000 >>> allocating from the beginning >>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff >> >> The fact that we are trying to reserve the CPU addresses in the rman is because >> bus_translate_resource rewrote the start address in the resource after it was allocated. >> >> That said, I can't see why rman_manage_region would actually fail. At this point the >> rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"), >> so only the check that should be failing are the checks against rm_start and >> rm_end. For the memory window, rm_start is always 0, and rm_end is always >> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new (0x60000000 - 0x600fffffff) >> ranges are within those bounds. I would instead expect to see some other issue later >> on where we fail to allocate a resource for a child BAR, but I wouldn't expect >> rman_manage_region to fail. >> >> Logging the return value from rman_manage_region would be the first step I think >> to see which error value it is returning. > > Looking at the code in sys/kern/subr_rman.c for rman_manage_region I see > the (mostly) return rv related logic only has ENONMEM (explicit return) and > EBUSY as non-0 possibilities (no other returns). The modern code also has EINVAL via an explicit return. > All the rv references and > all the returns are shown below: > > int rv = 0; > . . . The modern code also has here: DPRINTF(("rman_manage_region: <%s> request: start %#jx, end %#jx\n", rm->rm_descr, start, end)); if (start < rm->rm_start || end > rm->rm_end) return EINVAL; Adding rm->rm_start and rm->rm_end to the message might be appropriate --and would also be enough additional information to know if EINVAL would be returned. > r = int_alloc_resource(M_NOWAIT); > if (r == NULL) > return ENOMEM; > . . . > /* Check for any overlap with the current region. */ > if (r->r_start <= s->r_end && r->r_end >= s->r_start) { > rv = EBUSY; > goto out; > } > > /* Check for any overlap with the next region. */ > t = TAILQ_NEXT(s, r_link); > if (t && r->r_start <= t->r_end && r->r_end >= t->r_start) { > rv = EBUSY; > goto out; > } > . . . > out: > mtx_unlock(rm->rm_mtx); > return rv; > > int_alloc_resource failure would be failure (NULL) from the: > > struct resource_i *r; > > r = malloc(sizeof *r, M_RMAN, malloc_flag | M_ZERO); > > (associated with the M_NOWAIT argument). > > The malloc failure would likely go in a very different direction. > > > > Side note: looks like the EBUSY cases leak what r references. Still true in the newer code. >> Probably I should fix pci_host_generic.c to handle translation properly however. >> I can work on a patch for that. > === Mark Millard marklmi at yahoo.com