From nobody Wed Feb 14 18:45:45 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TZnFz6Bcpz51msd for ; Wed, 14 Feb 2024 18:45:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TZnFz3p88z4b0y for ; Wed, 14 Feb 2024 18:45:59 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x52c.google.com with SMTP id 4fb4d7f45d1cf-5638ae09ec1so150590a12.0 for ; Wed, 14 Feb 2024 10:45:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1707936358; x=1708541158; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ksiIdJwHkc+JLXgXUyQdZyacITAs7IyrYG0GJD+9dbA=; b=yyILMmYyHwVQPP62pfdJnXyJXpTIQ0ZhzD1NPezPaZ22Mcf1oWJiI6oL9CJjp0qQfZ fSGNiDgloKy6kVVLWK0COp0n1IOVa3ihXHX30F/j4A1cgo/FwpKF6Gy0jZd0WiWUb14x vze+Lme11AXfm6W7hxpPYCgCPScOnKxPTrgdJGmwyQCrOV2iFedvae+h/ejZveaz9HuQ rh5glcuCLTpKcWqNQLRs2SIiEYZ8BzFoHHMjq1K+IS7NHqbtFEG+jT0bn7iDJl8BZdpu rxZMLRRiEVEHZgk4JjgIOfBgRT18Q35zysl3jPXp0QrZtBEqcipHQLQdihOmgw8FAyJK 83qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707936358; x=1708541158; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ksiIdJwHkc+JLXgXUyQdZyacITAs7IyrYG0GJD+9dbA=; b=WoBIucNKUyjT3RB6md0JiaDtDNVM14l1DQyyytNe6w9gp0PLSCrEzk5pqpaXEb83JD KRnDczt/g39FxQ0XO3XhGM98cGSV4DNt+2BIOPdCeD5PGUfYqwRhRkCU+sTPOIZXwWfu WlB+MziSDjKEKNMGB4EqjM1wd3cQ1LPcn686pF9DZsFQBNbjQkvd6jHSXXszgMFl03f6 GrmeKT/faaVacgm8dZ8APuV2jlJ7qdtHeTgXEofxZ9/zlpkV+NIlwdSgqr0iQ3Pi4Z45 UVKZiMjIwsSgrXThmF7D7HCIx/tadFWw7rOGnpNRgG7k/rAUbLZlfGsIMwU3TcsHFnxd 71fA== X-Forwarded-Encrypted: i=1; AJvYcCWIXh03Ba6ZrzaCwcg2xuocpwrsUay3l+15jAYoWfUM9WP8lVtc+dJrwTngYOnXS3NSQS0LirJmGj5X/TUOI/n3crRRduN4qg== X-Gm-Message-State: AOJu0YxhryH7/N2F8oF3H492pTx2jM78hjMBLy4+OV5ynMRNWB3Sxxl+ Z2GEgzU+ONcbHw6IMnxsn/XhqN5gof5q+ihHas1PGE4yB8mGSL8+w1d6tstA8qDcJvzn8hb8Qkk Zt9t0QhyWsK/+busTxSAUj6Ra+MR1RJmqGMZH4w== X-Google-Smtp-Source: AGHT+IHJu5L8SOUlWD8AmH7X8Md6iZrTY1c2M1JmDaPGRi/pCeAMEV0zhOkXuotcoNjL0cYLyrpK367MLTJ3lHH/BGk= X-Received: by 2002:aa7:c541:0:b0:562:1a77:19a7 with SMTP id s1-20020aa7c541000000b005621a7719a7mr2696303edr.11.1707936357796; Wed, 14 Feb 2024 10:45:57 -0800 (PST) List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org MIME-Version: 1.0 References: <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5.ref@yahoo.com> <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5@yahoo.com> <1F704317-FDB8-4BDA-8A67-61CF48794DFE@yahoo.com> <9AFDF067-96E4-4E67-90D2-F40DAF3F138F@yahoo.com> <4C279710-5F88-4295-B1A4-7C395F3587E5@yahoo.com> <3A145420-399D-4EBD-9FF4-18924908AB1D@yahoo.com> <1298DF9C-0F82-4567-8E81-7332A608C7FC@yahoo.com> In-Reply-To: From: Warner Losh Date: Wed, 14 Feb 2024 11:45:45 -0700 Message-ID: Subject: Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic To: John Baldwin Cc: Mark Millard , FreeBSD ARM List , Current FreeBSD Content-Type: multipart/alternative; boundary="000000000000e7df3f06115beb28" X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4TZnFz3p88z4b0y X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated --000000000000e7df3f06115beb28 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey John, On Wed, Feb 14, 2024 at 10:30=E2=80=AFAM John Baldwin wro= te: > On 2/14/24 8:42 AM, Warner Losh wrote: > > On Wed, Feb 14, 2024 at 9:08=E2=80=AFAM John Baldwin = wrote: > > > >> On 2/12/24 5:57 PM, Mark Millard wrote: > >>> On Feb 12, 2024, at 16:36, Mark Millard wrote: > >>> > >>>> On Feb 12, 2024, at 16:10, Mark Millard wrote: > >>>> > >>>>> On Feb 12, 2024, at 12:00, Mark Millard wrote: > >>>>> > >>>>>> [Gack: I was looking at the wrong vintage of source code, predatin= g > >>>>>> your changes: wrong system used.] > >>>>>> > >>>>>> > >>>>>> On Feb 12, 2024, at 10:41, Mark Millard wrote: > >>>>>> > >>>>>>> On Feb 12, 2024, at 09:32, John Baldwin wrote: > >>>>>>> > >>>>>>>> On 2/9/24 8:13 PM, Mark Millard wrote: > >>>>>>>>> Summary: > >>>>>>>>> pcib0: mem > >> 0x7d500000-0x7d50930f irq 80,81 on simplebus2 > >>>>>>>>> pcib0: parsing FDT for ECAM0: > >>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: > >> 0x40000000 > >>>>>>>>> . . . > >>>>>>>>> rman_manage_region: request: start > >> 0x600000000, end 0x6000fffff > >>>>>>>>> panic: Failed to add resource to rman > >>>>>>>> > >>>>>>>> Hmmm, I suspect this is due to the way that bus_translate_resour= ce > >> works which is > >>>>>>>> fundamentally broken. It rewrites the start address of a resour= ce > >> in-situ instead > >>>>>>>> of keeping downstream resources separate from the upstream > >> resources. For example, > >>>>>>>> I don't see how you could ever release a resource in this design > >> without completely > >>>>>>>> screwing up your rman. That is, I expect trying to detach a PCI > >> device behind a > >>>>>>>> translating bridge that uses the current approach should corrupt > >> the allocated > >>>>>>>> resource ranges in an rman long before my changes. > >>>>>>>> > >>>>>>>> That said, that doesn't really explain the panic. Hmm, the pani= c > >> might be because > >>>>>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the > >> bus_translate_resource > >>>>>>>> hack only kicks in the activate_resource method of > >> pci_host_generic.c. > >>>>>>>> > >>>>>>>>> Detail: > >>>>>>>>> . . . > >>>>>>>>> pcib0: mem > >> 0x7d500000-0x7d50930f irq 80,81 on simplebus2 > >>>>>>>>> pcib0: parsing FDT for ECAM0: > >>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: > >> 0x40000000 > >>>>>>>> > >>>>>>>> This indicates this is a translating bus. > >>>>>>>> > >>>>>>>>> pcib1: irq 91 at device 0.0 on pci0 > >>>>>>>>> rman_manage_region: request: start 0x1, end > 0x1 > >>>>>>>>> pcib0: rman_reserve_resource: start=3D0xc0000000, end=3D0xc00ff= fff, > >> count=3D0x100000 > >>>>>>>>> rman_reserve_resource_bound: request: [0xc0000000= , > >> 0xc00fffff], length 0x100000, flags 102, device pcib1 > >>>>>>>>> rman_reserve_resource_bound: trying 0xffffffff > <0xc0000000,0xfffff> > >>>>>>>>> considering [0xc0000000, 0xffffffff] > >>>>>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 > >> (requested 0x100000) > >>>>>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000 > >>>>>>>>> allocating from the beginning > >>>>>>>>> rman_manage_region: request: start > >> 0x600000000, end 0x6000fffff > >>>>> > >>>>> What you later typed does not match: > >>>>> > >>>>> 0x600000000 > >>>>> 0x6000fffff > >>>>> > >>>>> You later typed: > >>>>> > >>>>> 0x60000000 > >>>>> 0x600fffffff > >>>>> > >>>>> This seems to have lead to some confusion from using the > >>>>> wrong figure(s). > >>>>> > >>>>>>>> The fact that we are trying to reserve the CPU addresses in the > >> rman is because > >>>>>>>> bus_translate_resource rewrote the start address in the resource > >> after it was allocated. > >>>>>>>> > >>>>>>>> That said, I can't see why rman_manage_region would actually fai= l. > >> At this point the > >>>>>>>> rman is empty (this is the first call to rman_manage_region for > >> "pcib1 memory window"), > >>>>>>>> so only the check that should be failing are the checks against > >> rm_start and > >>>>>>>> rm_end. For the memory window, rm_start is always 0, and rm_end > is > >> always > >>>>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new > >> (0x60000000 - 0x600fffffff) > >>>>>>>> ranges are within those bounds. > >>>>> > >>>>> No: > >>>>> > >>>>> 0xffffffff > >>>>> > >>>>> .vs (actual): > >>>>> > >>>>> 0x600000000 > >>>>> 0x6000fffff > >> > >> Ok, then this explains the failure if the "raw" addresses are above > 4G. I > >> have > >> access to an emag I'm currently using to test fixes to > pci_host_generic.c > >> to > >> avoid corrupting struct resource objects. I'll post the diff once I'v= e > got > >> something verified to work. > >> > >>> It looks to me like in sys/dev/pci/pci_pci.c the: > >>> > >>> static void > >>> pcib_probe_windows(struct pcib_softc *sc) > >>> { > >>> . . . > >>> pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, > 0xffffffff); > >>> . . . > >>> > >>> is just inappropriately restrictive about where in the system > >>> address space a PCIe can validly be mapped to on the high end. > >>> That, in turn, leads to the rejection on the RPi4B now that > >>> the range use is checked. > >> > >> No, the physical register in PCI-PCI bridges is only 32-bits. Only th= e > >> prefetchable BAR supports 64-bit addresses. This is why the host brid= ge > >> is doing a translation from the CPU side (0x600000000) to the PCI BAR > >> addresses (0xc0000000) so that the BAR addresses are down in the 32-bi= t > >> address range. It's also true that many PCI devices only support 32-b= it > >> addresses in memory BARs. 64-bit BARs are an optional extension not > >> universally supported. > >> > >> The translation here is somewhat akin to a type of MMU where the CPU > >> addresses are mapped to PCI addresses. The problem here is that the > >> PCI BAR resources need to "stay" as PCI addresses since we depend on > >> being able to use rman_get_start/end to get the PCI addresses of > >> allocated resources, but pci_host_generic.c currently rewrites the > >> addresses. > >> > >> Probably I should remove rman_set_start/end entirely (Warner added the= m > >> back in 2004) as the methods don't do anything to deal with the fallou= t > >> that the rman.rm_list linked-list is no longer sorted by address once > >> some addresses get rewritten, etc. > >> > > > > At the time, they made sense. Removing it, though may take some doing > > since we use it in about 284 places in sys/dev today... Somewhat more > > pervasive than I'd have thought they'd be... > > Eh, I only find the one that I'm now removing: > > % git grep -E 'rman_set_(start|end)' sys/ > sys/dev/pci/pci_host_generic.c: rman_set_start(r, start); > sys/dev/pci/pci_host_generic.c: rman_set_end(r, end); > sys/kern/subr_rman.c:rman_set_start(struct resource *r, rman_res_t start) > sys/kern/subr_rman.c:rman_set_end(struct resource *r, rman_res_t end) > sys/sys/rman.h:void rman_set_end(struct resource *_r, rman_res_t _end= ); > sys/sys/rman.h:void rman_set_start(struct resource *_r, rman_res_t > _start); > > Also, I managed to boot the emag I have access to this morning. I had to > fix a few other bugs in acpi(4) for my changes in pci_host_generic to wor= k > but will post reviews later today. > That's what happens when you grep for 'get' instead of 'set' before the morning caffeine kicks in .. Warner --000000000000e7df3f06115beb28 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey John,

On Wed, Feb 14, 2024 at 10:3= 0=E2=80=AFAM John Baldwin <jhb@freebs= d.org> wrote:
On 2/14/24 8:42 AM, Warner Losh wrote:
> On Wed, Feb 14, 2024 at 9:08=E2=80=AFAM John Baldwin <jhb@freebsd.org> wrote:
>
>> On 2/12/24 5:57 PM, Mark Millard wrote:
>>> On Feb 12, 2024, at 16:36, Mark Millard <marklmi@yahoo.com> wrote:
>>>
>>>> On Feb 12, 2024, at 16:10, Mark Millard <marklmi@yahoo.com> wrote: >>>>
>>>>> On Feb 12, 2024, at 12:00, Mark Millard <marklmi@yahoo.com> wrot= e:
>>>>>
>>>>>> [Gack: I was looking at the wrong vintage of sourc= e code, predating
>>>>>> your changes: wrong system used.]
>>>>>>
>>>>>>
>>>>>> On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> = wrote:
>>>>>>
>>>>>>> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> = wrote:
>>>>>>>
>>>>>>>> On 2/9/24 8:13 PM, Mark Millard wrote:
>>>>>>>>> Summary:
>>>>>>>>> pcib0: <BCM2838-compatible PCI-expr= ess controller> mem
>> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>>>>>> pcib0: parsing FDT for ECAM0:
>>>>>>>>> pcib0:=C2=A0 PCI addr: 0xc0000000, CPU= addr: 0x600000000, Size:
>> 0x40000000
>>>>>>>>> . . .
>>>>>>>>> rman_manage_region: <pcib1 memory w= indow> request: start
>> 0x600000000, end 0x6000fffff
>>>>>>>>> panic: Failed to add resource to rman<= br> >>>>>>>>
>>>>>>>> Hmmm, I suspect this is due to the way tha= t bus_translate_resource
>> works which is
>>>>>>>> fundamentally broken.=C2=A0 It rewrites th= e start address of a resource
>> in-situ instead
>>>>>>>> of keeping downstream resources separate f= rom the upstream
>> resources.=C2=A0 =C2=A0For example,
>>>>>>>> I don't see how you could ever release= a resource in this design
>> without completely
>>>>>>>> screwing up your rman.=C2=A0 That is, I ex= pect trying to detach a PCI
>> device behind a
>>>>>>>> translating bridge that uses the current a= pproach should corrupt
>> the allocated
>>>>>>>> resource ranges in an rman long before my = changes.
>>>>>>>>
>>>>>>>> That said, that doesn't really explain= the panic.=C2=A0 Hmm, the panic
>> might be because
>>>>>>>> for PCI bridge windows the driver now pass= es RF_ACTIVE and the
>> bus_translate_resource
>>>>>>>> hack only kicks in the activate_resource m= ethod of
>> pci_host_generic.c.
>>>>>>>>
>>>>>>>>> Detail:
>>>>>>>>> . . .
>>>>>>>>> pcib0: <BCM2838-compatible PCI-expr= ess controller> mem
>> 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>>>>>> pcib0: parsing FDT for ECAM0:
>>>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr:= 0x600000000, Size:
>> 0x40000000
>>>>>>>>
>>>>>>>> This indicates this is a translating bus.<= br> >>>>>>>>
>>>>>>>>> pcib1: <PCI-PCI bridge> irq 91 a= t device 0.0 on pci0
>>>>>>>>> rman_manage_region: <pcib1 bus numb= ers> request: start 0x1, end 0x1
>>>>>>>>> pcib0: rman_reserve_resource: start=3D= 0xc0000000, end=3D0xc00fffff,
>> count=3D0x100000
>>>>>>>>> rman_reserve_resource_bound: <PCIe = Memory> request: [0xc0000000,
>> 0xc00fffff], length 0x100000, flags 102, device pcib1
>>>>>>>>> rman_reserve_resource_bound: trying 0x= ffffffff <0xc0000000,0xfffff>
>>>>>>>>> considering [0xc0000000, 0xffffffff] >>>>>>>>> truncated region: [0xc0000000, 0xc00ff= fff]; size 0x100000
>> (requested 0x100000)
>>>>>>>>> candidate region: [0xc0000000, 0xc00ff= fff], size 0x100000
>>>>>>>>> allocating from the beginning
>>>>>>>>> rman_manage_region: <pcib1 memory w= indow> request: start
>> 0x600000000, end 0x6000fffff
>>>>>
>>>>> What you later typed does not match:
>>>>>
>>>>> 0x600000000
>>>>> 0x6000fffff
>>>>>
>>>>> You later typed:
>>>>>
>>>>> 0x60000000
>>>>> 0x600fffffff
>>>>>
>>>>> This seems to have lead to some confusion from using t= he
>>>>> wrong figure(s).
>>>>>
>>>>>>>> The fact that we are trying to reserve the= CPU addresses in the
>> rman is because
>>>>>>>> bus_translate_resource rewrote the start a= ddress in the resource
>> after it was allocated.
>>>>>>>>
>>>>>>>> That said, I can't see why rman_manage= _region would actually fail.
>> At this point the
>>>>>>>> rman is empty (this is the first call to r= man_manage_region for
>> "pcib1 memory window"),
>>>>>>>> so only the check that should be failing a= re the checks against
>> rm_start and
>>>>>>>> rm_end.=C2=A0 For the memory window, rm_st= art is always 0, and rm_end is
>> always
>>>>>>>> 0xffffffff, so both the old (0xc00000000 -= 0xc00fffff) and new
>> (0x60000000 - 0x600fffffff)
>>>>>>>> ranges are within those bounds.
>>>>>
>>>>> No:
>>>>>
>>>>> 0xffffffff
>>>>>
>>>>> .vs (actual):
>>>>>
>>>>> 0x600000000
>>>>> 0x6000fffff
>>
>> Ok, then this explains the failure if the "raw" addresse= s are above 4G.=C2=A0 I
>> have
>> access to an emag I'm currently using to test fixes to pci_hos= t_generic.c
>> to
>> avoid corrupting struct resource objects.=C2=A0 I'll post the = diff once I've got
>> something verified to work.
>>
>>> It looks to me like in sys/dev/pci/pci_pci.c the:
>>>
>>> static void
>>> pcib_probe_windows(struct pcib_softc *sc)
>>> {
>>> . . .
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0pcib_alloc_window(sc, = &sc->mem, SYS_RES_MEMORY, 0, 0xffffffff);
>>> . . .
>>>
>>> is just inappropriately restrictive about where in the system<= br> >>> address space a PCIe can validly be mapped to on the high end.=
>>> That, in turn, leads to the rejection on the RPi4B now that >>> the range use is checked.
>>
>> No, the physical register in PCI-PCI bridges is only 32-bits.=C2= =A0 Only the
>> prefetchable BAR supports 64-bit addresses.=C2=A0 This is why the = host bridge
>> is doing a translation from the CPU side (0x600000000) to the PCI = BAR
>> addresses (0xc0000000) so that the BAR addresses are down in the 3= 2-bit
>> address range.=C2=A0 It's also true that many PCI devices only= support 32-bit
>> addresses in memory BARs.=C2=A0 64-bit BARs are an optional extens= ion not
>> universally supported.
>>
>> The translation here is somewhat akin to a type of MMU where the C= PU
>> addresses are mapped to PCI addresses.=C2=A0 The problem here is t= hat the
>> PCI BAR resources need to "stay" as PCI addresses since = we depend on
>> being able to use rman_get_start/end to get the PCI addresses of >> allocated resources, but pci_host_generic.c currently rewrites the=
>> addresses.
>>
>> Probably I should remove rman_set_start/end entirely (Warner added= them
>> back in 2004) as the methods don't do anything to deal with th= e fallout
>> that the rman.rm_list linked-list is no longer sorted by address o= nce
>> some addresses get rewritten, etc.
>>
>
> At the time, they made sense. Removing it, though may take some doing<= br> > since we use it in about 284 places=C2=A0 in sys/dev today... Somewhat= more
> pervasive than I'd have thought they'd be...

Eh, I only find the one that I'm now removing:

% git grep -E 'rman_set_(start|end)' sys/
sys/dev/pci/pci_host_generic.c: rman_set_start(r, start);
sys/dev/pci/pci_host_generic.c: rman_set_end(r, end);
sys/kern/subr_rman.c:rman_set_start(struct resource *r, rman_res_t start) sys/kern/subr_rman.c:rman_set_end(struct resource *r, rman_res_t end)
sys/sys/rman.h:void=C2=A0 =C2=A0 =C2=A0rman_set_end(struct resource *_r, rm= an_res_t _end);
sys/sys/rman.h:void=C2=A0 =C2=A0 =C2=A0rman_set_start(struct resource *_r, = rman_res_t _start);

Also, I managed to boot the emag I have access to this morning.=C2=A0 I had= to
fix a few other bugs in acpi(4) for my changes in pci_host_generic to work<= br> but will post reviews later today.

That= 's what happens when you grep for 'get' instead of 'set'= ; before the morning caffeine kicks in ..

Warner
--000000000000e7df3f06115beb28--