Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: <pcib1 memory window> request" leads to panic

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 13 Feb 2024 00:36:28 UTC
On Feb 12, 2024, at 16:10, Mark Millard <marklmi@yahoo.com> wrote:

> On Feb 12, 2024, at 12:00, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> [Gack: I was looking at the wrong vintage of source code, predating
>> your changes: wrong system used.]
>> 
>> 
>> On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote:
>> 
>>> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote:
>>> 
>>>> On 2/9/24 8:13 PM, Mark Millard wrote:
>>>>> Summary:
>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>> pcib0: parsing FDT for ECAM0:
>>>>> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
>>>>> . . .
>>>>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff
>>>>> panic: Failed to add resource to rman
>>>> 
>>>> Hmmm, I suspect this is due to the way that bus_translate_resource works which is
>>>> fundamentally broken.  It rewrites the start address of a resource in-situ instead
>>>> of keeping downstream resources separate from the upstream resources.   For example,
>>>> I don't see how you could ever release a resource in this design without completely
>>>> screwing up your rman.  That is, I expect trying to detach a PCI device behind a
>>>> translating bridge that uses the current approach should corrupt the allocated
>>>> resource ranges in an rman long before my changes.
>>>> 
>>>> That said, that doesn't really explain the panic.  Hmm, the panic might be because
>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the bus_translate_resource
>>>> hack only kicks in the activate_resource method of pci_host_generic.c.
>>>> 
>>>>> Detail:
>>>>> . . .
>>>>> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>> pcib0: parsing FDT for ECAM0:
>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
>>>> 
>>>> This indicates this is a translating bus.
>>>> 
>>>>> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
>>>>> rman_manage_region: <pcib1 bus numbers> request: start 0x1, end 0x1
>>>>> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000
>>>>> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pcib1
>>>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
>>>>> considering [0xc0000000, 0xffffffff]
>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested 0x100000)
>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
>>>>> allocating from the beginning
>>>>> rman_manage_region: <pcib1 memory window> request: start 0x600000000, end 0x6000fffff
> 
> What you later typed does not match:
> 
> 0x600000000
> 0x6000fffff
> 
> You later typed:
> 
> 0x60000000
> 0x600fffffff
> 
> This seems to have lead to some confusion from using the
> wrong figure(s).
> 
>>>> The fact that we are trying to reserve the CPU addresses in the rman is because
>>>> bus_translate_resource rewrote the start address in the resource after it was allocated.
>>>> 
>>>> That said, I can't see why rman_manage_region would actually fail.  At this point the
>>>> rman is empty (this is the first call to rman_manage_region for "pcib1 memory window"),
>>>> so only the check that should be failing are the checks against rm_start and
>>>> rm_end.  For the memory window, rm_start is always 0, and rm_end is always
>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new (0x60000000 - 0x600fffffff)
>>>> ranges are within those bounds.
> 
> No:
> 
> 0xffffffff
> 
> .vs (actual):
> 
> 0x600000000
> 0x6000fffff
> 
> Both the actuals are larger then the 0xffffffff figure you list above.
> 
> I've no clue if the 0xffffffff might have its own typos.
> 
>>>> I would instead expect to see some other issue later
>>>> on where we fail to allocate a resource for a child BAR, but I wouldn't expect
>>>> rman_manage_region to fail.
>>>> 
>>>> Logging the return value from rman_manage_region would be the first step I think
>>>> to see which error value it is returning.
>>> 
>>> Looking at the code in sys/kern/subr_rman.c for rman_manage_region I see
>>> the (mostly) return rv related logic only has ENONMEM (explicit return) and
>>> EBUSY as non-0 possibilities (no other returns).
>> 
>> The modern code also has EINVAL via an explicit return.
>> 
>>> All the rv references and
>>> all the returns are shown below:
>>> 
>>>      int rv = 0;
>>> . . .
>> 
>> The modern code also has here:
>> 
>>       DPRINTF(("rman_manage_region: <%s> request: start %#jx, end %#jx\n",
>>           rm->rm_descr, start, end));
>>       if (start < rm->rm_start || end > rm->rm_end)
>>               return EINVAL;
>> 
>> Adding rm->rm_start and rm->rm_end to the message might be
>> appropriate --and would also be enough additional information
>> to know if EINVAL would be returned.
> 
> I added such code and built, installed, and tried booting
> a separate kernel. The result was:
> 
> rman_manage_region: <pcib1 memory window> range 0..0xffffffff : request: start 0x600000000, end 0x6000fffff
> panic: Failed to add resource to rman
> 
> So:
> 
> 0
> vs. start:
> 0x600000000
> 
> and:
> 
> 0xffffffff
> vs. end:
> 0x6000fffff
> 
> The 0x600000000..0x6000fffff range is not a range of RAM addresses
> [too large for that for the 8 GiByte RPi4B] and, so, is well above
> the 32 bit boundary too.
> 
> That is the only line referencing "memory window". A 32 bit initial range
> for some reason. For reference, the source change in question was:
> 
> -       DPRINTF(("rman_manage_region: <%s> request: start %#jx, end %#jx\n",
> -           rm->rm_descr, start, end));
> +       DPRINTF(("rman_manage_region: <%s> range %#jx..%#jx : request: start %#jx, end %#jx\n",
> +           rm->rm_descr, rm->rm_start, rm->rm_end, start, end));
>        if (start < rm->rm_start || end > rm->rm_end)
>                return EINVAL;
> 
> I'll show a larger boot output context at the bottom of the message
> for reference.
> 
>>>      r = int_alloc_resource(M_NOWAIT);
>>>      if (r == NULL)
>>>              return ENOMEM;
>>> . . .
>>>              /* Check for any overlap with the current region. */
>>>              if (r->r_start <= s->r_end && r->r_end >= s->r_start) {
>>>                      rv = EBUSY;
>>>                      goto out;
>>>              }
>>> 
>>>              /* Check for any overlap with the next region. */
>>>              t = TAILQ_NEXT(s, r_link);
>>>              if (t && r->r_start <= t->r_end && r->r_end >= t->r_start) {
>>>                      rv = EBUSY;
>>>                      goto out;
>>>              }
>>> . . .
>>> out:
>>>      mtx_unlock(rm->rm_mtx);
>>>      return rv;
>>> 
>>> int_alloc_resource failure would be failure (NULL) from the:
>>> 
>>>      struct resource_i *r;
>>> 
>>>      r = malloc(sizeof *r, M_RMAN, malloc_flag | M_ZERO);
>>> 
>>> (associated with the M_NOWAIT argument).
>>> 
>>> The malloc failure would likely go in a very different direction.
>>> 
>>> 
>>> 
>>> Side note: looks like the EBUSY cases leak what r references.
>> 
>> Still true in the newer code.
>> 
>>>> Probably I should fix pci_host_generic.c to handle translation properly however.
>>>> I can work on a patch for that.
>>> 
> 
> For reference:
> 
> . . .
> pcib0: <BCM2838-compatible PCI-express controller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
> pcib0: parsing FDT for ECAM0:
> pcib0:  PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: 0x40000000
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0:  PCI addr: 0x0, CPU addr: 0x0, Size: 0x0
> pcib0: Bus is not cache-coherent
> rman_reserve_resource_bound: <I/O memory addresses> request: [0xfd500000, 0xfd50930f], length 0x9310, flags 100, device pcib0
> rman_reserve_resource_bound: trying 0x1fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x1fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x31bfffff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x33298fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39bf1fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39c02fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39c06fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39c07fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39c08fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39c2afff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39c36fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x39c37fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x3b03ffff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x3b04ffff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x3b2fffff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x3ee61fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x3ee63fff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0x3fffffff <0xfd500000,0x930f>
> rman_reserve_resource_bound: tried 0xfbffffff <0xfd500000,0x930f>
> considering [0xfc000000, 0xfd5d1fff]
> truncated region: [0xfd500000, 0xfd50930f]; size 0x9310 (requested 0x9310)
> candidate region: [0xfd500000, 0xfd50930f], size 0x9310
> splitting region in three parts: [0xfc000000, 0xfd4fffff]; [0xfd500000, 0xfd50930f]; [0xfd509310, 0xfd5d1fff]
> rman_manage_region: <PCIe Memory> range 0..0xffffffffffffffff : request: start 0xc0000000, end 0xffffffff
> pcib0: hardware identifies as revision 0x304.
> pcib0: note: reported link speed is 5.0 GT/s.
> rman_reserve_resource_bound: <Interrupts> request: [0x51, 0x51], length 0x1, flags 0, device pcib0
> rman_reserve_resource_bound: trying 0 <0x51,0>
> rman_reserve_resource_bound: tried 0 <0x51,0>
> rman_reserve_resource_bound: tried 0x1 <0x51,0>
> rman_reserve_resource_bound: tried 0x2 <0x51,0>
> rman_reserve_resource_bound: tried 0x3 <0x51,0>
> rman_reserve_resource_bound: tried 0x4 <0x51,0>
> rman_reserve_resource_bound: tried 0x5 <0x51,0>
> rman_reserve_resource_bound: tried 0x6 <0x51,0>
> rman_reserve_resource_bound: tried 0x7 <0x51,0>
> rman_reserve_resource_bound: tried 0xc <0x51,0>
> rman_reserve_resource_bound: tried 0xd <0x51,0>
> rman_reserve_resource_bound: tried 0xe <0x51,0>
> rman_reserve_resource_bound: tried 0xf <0x51,0>
> rman_reserve_resource_bound: tried 0x10 <0x51,0>
> rman_reserve_resource_bound: tried 0x11 <0x51,0>
> rman_reserve_resource_bound: tried 0x12 <0x51,0>
> rman_reserve_resource_bound: tried 0x17 <0x51,0>
> rman_reserve_resource_bound: tried 0x18 <0x51,0>
> rman_reserve_resource_bound: tried 0x1a <0x51,0>
> rman_reserve_resource_bound: tried 0x1b <0x51,0>
> rman_reserve_resource_bound: tried 0x22 <0x51,0>
> rman_reserve_resource_bound: tried 0x23 <0x51,0>
> rman_reserve_resource_bound: tried 0x24 <0x51,0>
> rman_reserve_resource_bound: tried 0x25 <0x51,0>
> rman_reserve_resource_bound: tried 0x26 <0x51,0>
> rman_reserve_resource_bound: tried 0x27 <0x51,0>
> rman_reserve_resource_bound: tried 0x28 <0x51,0>
> rman_reserve_resource_bound: tried 0x29 <0x51,0>
> rman_reserve_resource_bound: tried 0x4e <0x51,0>
> rman_reserve_resource_bound: tried 0x4f <0x51,0>
> considering [0x50, 0xffffffffffffffff]
> truncated region: [0x51, 0x51]; size 0x1 (requested 0x1)
> candidate region: [0x51, 0x51], size 0x1
> splitting region in three parts: [0x50, 0x50]; [0x51, 0x51]; [0x52, 0xffffffffffffffff]
> pci0: <OFW PCI bus> on pcib0
> rman_manage_region: <PCI domain 0 bus numbers> range 0..0xff : request: start 0, end 0xff
> rman_reserve_resource_bound: <PCI domain 0 bus numbers> request: [0, 0], length 0x1, flags 0, device pci0
> rman_reserve_resource_bound: trying 0xff <0,0>
> considering [0, 0xff]
> truncated region: [0, 0]; size 0x1 (requested 0x1)
> candidate region: [0, 0], size 0x1
> allocating from the beginning
> pci0: domain=0, physical bus=0
> found-> vendor=0x14e4, dev=0x2711, revid=0x00
>        domain=0, bus=0, slot=0, func=0
>        class=06-04-00, hdrtype=0x01, mfdev=0
>        cmdreg=0x0000, statreg=0x0010, cachelnsz=0 (dwords)
>        lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
>        intpin=a, irq=0
>        powerspec 3  supports D0 D3  current D0
>        secbus=1, subbus=1
> rman_reserve_resource_bound: <PCI domain 0 bus numbers> request: [0x1, 0x1], length 0x1, flags 0, device (null)
> rman_reserve_resource_bound: trying 0 <0x1,0>
> rman_reserve_resource_bound: tried 0 <0x1,0>
> considering [0x1, 0xff]
> truncated region: [0x1, 0x1]; size 0x1 (requested 0x1)
> candidate region: [0x1, 0x1], size 0x1
> allocating from the beginning
> pcib1: <PCI-PCI bridge> irq 91 at device 0.0 on pci0
> rman_manage_region: <pcib1 bus numbers> range 0..0xff : request: start 0x1, end 0x1
> pcib0: rman_reserve_resource: start=0xc0000000, end=0xc00fffff, count=0x100000
> rman_reserve_resource_bound: <PCIe Memory> request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pcib1
> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xfffff>
> considering [0xc0000000, 0xffffffff]
> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 (requested 0x100000)
> candidate region: [0xc0000000, 0xc00fffff], size 0x100000
> allocating from the beginning
> rman_manage_region: <pcib1 memory window> range 0..0xffffffff : request: start 0x600000000, end 0x6000fffff
> panic: Failed to add resource to rman
> cpuid = 0
> time = 1
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x38
> vpanic() at vpanic+0x1a4
> panic() at panic+0x48
> pcib_add_window_resources() at pcib_add_window_resources+0xf4
> pcib_alloc_window() at pcib_alloc_window+0xfc
> pcib_attach_common() at pcib_attach_common+0xa18
> pcib_attach() at pcib_attach+0x14
> device_attach() at device_attach+0x3fc
> device_probe_and_attach() at device_probe_and_attach+0x80
> bus_generic_attach() at bus_generic_attach+0x1c
> pci_attach() at pci_attach+0xec
> device_attach() at device_attach+0x3fc
> device_probe_and_attach() at device_probe_and_attach+0x80
> bus_generic_attach() at bus_generic_attach+0x1c
> device_attach() at device_attach+0x3fc
> device_probe_and_attach() at device_probe_and_attach+0x80
> bus_generic_new_pass() at bus_generic_new_pass+0x100
> bus_generic_new_pass() at bus_generic_new_pass+0xb0
> bus_generic_new_pass() at bus_generic_new_pass+0xb0
> bus_generic_new_pass() at bus_generic_new_pass+0xb0
> bus_set_pass() at bus_set_pass+0x50
> mi_startup() at mi_startup+0x1e0
> virtdone() at virtdone+0x68
> KDB: enter: panic
> [ thread pid 0 tid 100000 ]
> Stopped at      kdb_enter+0x4c: str     xzr, [x19, #1280]
> db> 
> 

FYI:

bcm2711-peripherals.pdf "Figure 1. BCM2711 Address Maps"
documents the PCIe address space range as (their "_"
notation):

0x6_0000_0000..0x7_FFFF_FFFF

for both of the address maps:

'Full 35-bit Address Map'
'ARM view of the Address Map in "Low Peripheral" mode'

The Low Peripheral mode is what is in use.

So the 0x6_0000_0000..0x6_000f_ffff range looks
to be a valid ARM system address space for the
PCIe to map to.

===
Mark Millard
marklmi at yahoo.com