From nobody Wed Feb 14 16:42:36 2024 X-Original-To: freebsd-arm@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TZkWs5jXmz50xl3 for ; Wed, 14 Feb 2024 16:42:49 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Received: from mail-ed1-x530.google.com (mail-ed1-x530.google.com [IPv6:2a00:1450:4864:20::530]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TZkWs3dSbz49yf for ; Wed, 14 Feb 2024 16:42:49 +0000 (UTC) (envelope-from wlosh@bsdimp.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-561f8b8c058so1962883a12.0 for ; Wed, 14 Feb 2024 08:42:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bsdimp-com.20230601.gappssmtp.com; s=20230601; t=1707928968; x=1708533768; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=nU/C/L6NJboRjZ/WwavYS7/vNGqdk8Pa7agypyoDG/0=; b=FrfSwvqKY97+6UXh3/7Ae1lng/09/GfKdcFjGeicRnTM8wg7PhTJFu61zTmIkY6+2w Oh8aUoXgL3amc55cDJSTvj17wENDDU27GQw3i+q/R8JgioYMFyU+V0GzDRupHp7R959q RIh+dg+j4x4M6iVn5SMfO3vurAWZhXrCRVqv4sWQnCePW6ylbiH8GJC9v0fXUz03NloA +QemISutfbIYrzHvpwFW0zkYywz9VknZacRXusGbeN9iQ2TTJe9MLFinIEO0lTZg6fN6 CH/D//2bFwc3J+rn8/JaRpS1BLSwwDDMooMjDvd9SuEof2z1AxPOECJUNPDOMzyBRxWK k1jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707928968; x=1708533768; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nU/C/L6NJboRjZ/WwavYS7/vNGqdk8Pa7agypyoDG/0=; b=n8CP7VyOtp4KB0A2qjWKsEyeby7t9FdwL8eFgx5JQT86biksfLM4K8aceZwJPnXSz5 KOluWzohtuvix9ys+IR48vyc1TVgcYpYsFXHPGTcXFzJOO4ymQiSJ10Sdd/A340YEQic /u0fNjVGRwnBSPuDZt+4oYqAR/jruoTIVstdeaYmc3wP+nJyVwJs6ZcT8YftpMItWAvp WhTXFT8XovOZUHbU39xoChBaA2CZFNHHDMbRPQ1taG4ks7TMlQUg6BmuJREoGJrCc+dO uslAKC+3gWoOokWfyeU09J8OLw80+ycys3jsdZXTsXRupjEMF0Z9z/nxokoVwBv7HmdE X5iA== X-Forwarded-Encrypted: i=1; AJvYcCWmfoPN+N3CiMMGR0mA05mn8bzQ7IGYEO7lBV/OfWgPJeQKnpLLWuvxim6OQFhR4hbh0f+Ixxhq8UfeeTNslp5IYv3qgOmewQ== X-Gm-Message-State: AOJu0YzTJvv+O6ONBshdgSYwf3IzVfsxTAs7dj2Ye0mUMaybFb8+uarp mxXwZJF7FNGg6NwjUJMqwViVS7sSbXo7XlgsLc+UNPJDnc3t+vu+53g9etvC0uGzxHH9ENqC6Wc bIVdxJiySEfMXapTmJOK8F2LPY4pCRz/wmnw0+w== X-Google-Smtp-Source: AGHT+IGmRp0UgqfN56Q3LCOBf4380b7Ci5HLenp16NpKSk4pV0fO/E+qlL7A4PJ6U3MZJA+d8m4nyF1vP61+TxITDno= X-Received: by 2002:aa7:c1d8:0:b0:561:f7d9:d77e with SMTP id d24-20020aa7c1d8000000b00561f7d9d77emr2223033edp.10.1707928967583; Wed, 14 Feb 2024 08:42:47 -0800 (PST) List-Id: Porting FreeBSD to ARM processors List-Archive: https://lists.freebsd.org/archives/freebsd-arm List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-arm@freebsd.org MIME-Version: 1.0 References: <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5.ref@yahoo.com> <76AB969F-5BC5-4116-8AF4-3ED2CABEBBA5@yahoo.com> <1F704317-FDB8-4BDA-8A67-61CF48794DFE@yahoo.com> <9AFDF067-96E4-4E67-90D2-F40DAF3F138F@yahoo.com> <4C279710-5F88-4295-B1A4-7C395F3587E5@yahoo.com> <3A145420-399D-4EBD-9FF4-18924908AB1D@yahoo.com> <1298DF9C-0F82-4567-8E81-7332A608C7FC@yahoo.com> In-Reply-To: From: Warner Losh Date: Wed, 14 Feb 2024 09:42:36 -0700 Message-ID: Subject: Re: Recent commits reject RPi4B booting: pcib0 vs. pcib1 "rman_manage_region: request" leads to panic To: John Baldwin Cc: Mark Millard , FreeBSD ARM List , Current FreeBSD Content-Type: multipart/alternative; boundary="0000000000006a28f106115a33e8" X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4TZkWs3dSbz49yf X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated --0000000000006a28f106115a33e8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Feb 14, 2024 at 9:08=E2=80=AFAM John Baldwin wrot= e: > On 2/12/24 5:57 PM, Mark Millard wrote: > > On Feb 12, 2024, at 16:36, Mark Millard wrote: > > > >> On Feb 12, 2024, at 16:10, Mark Millard wrote: > >> > >>> On Feb 12, 2024, at 12:00, Mark Millard wrote: > >>> > >>>> [Gack: I was looking at the wrong vintage of source code, predating > >>>> your changes: wrong system used.] > >>>> > >>>> > >>>> On Feb 12, 2024, at 10:41, Mark Millard wrote: > >>>> > >>>>> On Feb 12, 2024, at 09:32, John Baldwin wrote: > >>>>> > >>>>>> On 2/9/24 8:13 PM, Mark Millard wrote: > >>>>>>> Summary: > >>>>>>> pcib0: mem > 0x7d500000-0x7d50930f irq 80,81 on simplebus2 > >>>>>>> pcib0: parsing FDT for ECAM0: > >>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: > 0x40000000 > >>>>>>> . . . > >>>>>>> rman_manage_region: request: start > 0x600000000, end 0x6000fffff > >>>>>>> panic: Failed to add resource to rman > >>>>>> > >>>>>> Hmmm, I suspect this is due to the way that bus_translate_resource > works which is > >>>>>> fundamentally broken. It rewrites the start address of a resource > in-situ instead > >>>>>> of keeping downstream resources separate from the upstream > resources. For example, > >>>>>> I don't see how you could ever release a resource in this design > without completely > >>>>>> screwing up your rman. That is, I expect trying to detach a PCI > device behind a > >>>>>> translating bridge that uses the current approach should corrupt > the allocated > >>>>>> resource ranges in an rman long before my changes. > >>>>>> > >>>>>> That said, that doesn't really explain the panic. Hmm, the panic > might be because > >>>>>> for PCI bridge windows the driver now passes RF_ACTIVE and the > bus_translate_resource > >>>>>> hack only kicks in the activate_resource method of > pci_host_generic.c. > >>>>>> > >>>>>>> Detail: > >>>>>>> . . . > >>>>>>> pcib0: mem > 0x7d500000-0x7d50930f irq 80,81 on simplebus2 > >>>>>>> pcib0: parsing FDT for ECAM0: > >>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x600000000, Size: > 0x40000000 > >>>>>> > >>>>>> This indicates this is a translating bus. > >>>>>> > >>>>>>> pcib1: irq 91 at device 0.0 on pci0 > >>>>>>> rman_manage_region: request: start 0x1, end 0= x1 > >>>>>>> pcib0: rman_reserve_resource: start=3D0xc0000000, end=3D0xc00ffff= f, > count=3D0x100000 > >>>>>>> rman_reserve_resource_bound: request: [0xc0000000, > 0xc00fffff], length 0x100000, flags 102, device pcib1 > >>>>>>> rman_reserve_resource_bound: trying 0xffffffff <0xc0000000,0xffff= f> > >>>>>>> considering [0xc0000000, 0xffffffff] > >>>>>>> truncated region: [0xc0000000, 0xc00fffff]; size 0x100000 > (requested 0x100000) > >>>>>>> candidate region: [0xc0000000, 0xc00fffff], size 0x100000 > >>>>>>> allocating from the beginning > >>>>>>> rman_manage_region: request: start > 0x600000000, end 0x6000fffff > >>> > >>> What you later typed does not match: > >>> > >>> 0x600000000 > >>> 0x6000fffff > >>> > >>> You later typed: > >>> > >>> 0x60000000 > >>> 0x600fffffff > >>> > >>> This seems to have lead to some confusion from using the > >>> wrong figure(s). > >>> > >>>>>> The fact that we are trying to reserve the CPU addresses in the > rman is because > >>>>>> bus_translate_resource rewrote the start address in the resource > after it was allocated. > >>>>>> > >>>>>> That said, I can't see why rman_manage_region would actually fail. > At this point the > >>>>>> rman is empty (this is the first call to rman_manage_region for > "pcib1 memory window"), > >>>>>> so only the check that should be failing are the checks against > rm_start and > >>>>>> rm_end. For the memory window, rm_start is always 0, and rm_end i= s > always > >>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00fffff) and new > (0x60000000 - 0x600fffffff) > >>>>>> ranges are within those bounds. > >>> > >>> No: > >>> > >>> 0xffffffff > >>> > >>> .vs (actual): > >>> > >>> 0x600000000 > >>> 0x6000fffff > > Ok, then this explains the failure if the "raw" addresses are above 4G. = I > have > access to an emag I'm currently using to test fixes to pci_host_generic.c > to > avoid corrupting struct resource objects. I'll post the diff once I've g= ot > something verified to work. > > > It looks to me like in sys/dev/pci/pci_pci.c the: > > > > static void > > pcib_probe_windows(struct pcib_softc *sc) > > { > > . . . > > pcib_alloc_window(sc, &sc->mem, SYS_RES_MEMORY, 0, 0xffffffff)= ; > > . . . > > > > is just inappropriately restrictive about where in the system > > address space a PCIe can validly be mapped to on the high end. > > That, in turn, leads to the rejection on the RPi4B now that > > the range use is checked. > > No, the physical register in PCI-PCI bridges is only 32-bits. Only the > prefetchable BAR supports 64-bit addresses. This is why the host bridge > is doing a translation from the CPU side (0x600000000) to the PCI BAR > addresses (0xc0000000) so that the BAR addresses are down in the 32-bit > address range. It's also true that many PCI devices only support 32-bit > addresses in memory BARs. 64-bit BARs are an optional extension not > universally supported. > > The translation here is somewhat akin to a type of MMU where the CPU > addresses are mapped to PCI addresses. The problem here is that the > PCI BAR resources need to "stay" as PCI addresses since we depend on > being able to use rman_get_start/end to get the PCI addresses of > allocated resources, but pci_host_generic.c currently rewrites the > addresses. > > Probably I should remove rman_set_start/end entirely (Warner added them > back in 2004) as the methods don't do anything to deal with the fallout > that the rman.rm_list linked-list is no longer sorted by address once > some addresses get rewritten, etc. > At the time, they made sense. Removing it, though may take some doing since we use it in about 284 places in sys/dev today... Somewhat more pervasive than I'd have thought they'd be... Warner --0000000000006a28f106115a33e8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Wed, Feb 14, 2024 at 9:08=E2=80=AF= AM John Baldwin <jhb@freebsd.org&= gt; wrote:
On 2/= 12/24 5:57 PM, Mark Millard wrote:
> On Feb 12, 2024, at 16:36, Mark Millard <marklmi@yahoo.com> wrote:
>
>> On Feb 12, 2024, at 16:10, Mark Millard <marklmi@yahoo.com> wrote:
>>
>>> On Feb 12, 2024, at 12:00, Mark Millard <marklmi@yahoo.com> wrote:
>>>
>>>> [Gack: I was looking at the wrong vintage of source code, = predating
>>>> your changes: wrong system used.]
>>>>
>>>>
>>>> On Feb 12, 2024, at 10:41, Mark Millard <marklmi@yahoo.com> wrote: >>>>
>>>>> On Feb 12, 2024, at 09:32, John Baldwin <jhb@freebsd.org> wrote: >>>>>
>>>>>> On 2/9/24 8:13 PM, Mark Millard wrote:
>>>>>>> Summary:
>>>>>>> pcib0: <BCM2838-compatible PCI-express cont= roller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>>>> pcib0: parsing FDT for ECAM0:
>>>>>>> pcib0:=C2=A0 PCI addr: 0xc0000000, CPU addr: 0= x600000000, Size: 0x40000000
>>>>>>> . . .
>>>>>>> rman_manage_region: <pcib1 memory window>= ; request: start 0x600000000, end 0x6000fffff
>>>>>>> panic: Failed to add resource to rman
>>>>>>
>>>>>> Hmmm, I suspect this is due to the way that bus_tr= anslate_resource works which is
>>>>>> fundamentally broken.=C2=A0 It rewrites the start = address of a resource in-situ instead
>>>>>> of keeping downstream resources separate from the = upstream resources.=C2=A0 =C2=A0For example,
>>>>>> I don't see how you could ever release a resou= rce in this design without completely
>>>>>> screwing up your rman.=C2=A0 That is, I expect try= ing to detach a PCI device behind a
>>>>>> translating bridge that uses the current approach = should corrupt the allocated
>>>>>> resource ranges in an rman long before my changes.=
>>>>>>
>>>>>> That said, that doesn't really explain the pan= ic.=C2=A0 Hmm, the panic might be because
>>>>>> for PCI bridge windows the driver now passes RF_AC= TIVE and the bus_translate_resource
>>>>>> hack only kicks in the activate_resource method of= pci_host_generic.c.
>>>>>>
>>>>>>> Detail:
>>>>>>> . . .
>>>>>>> pcib0: <BCM2838-compatible PCI-express cont= roller> mem 0x7d500000-0x7d50930f irq 80,81 on simplebus2
>>>>>>> pcib0: parsing FDT for ECAM0:
>>>>>>> pcib0: PCI addr: 0xc0000000, CPU addr: 0x60000= 0000, Size: 0x40000000
>>>>>>
>>>>>> This indicates this is a translating bus.
>>>>>>
>>>>>>> pcib1: <PCI-PCI bridge> irq 91 at device= 0.0 on pci0
>>>>>>> rman_manage_region: <pcib1 bus numbers> = request: start 0x1, end 0x1
>>>>>>> pcib0: rman_reserve_resource: start=3D0xc00000= 00, end=3D0xc00fffff, count=3D0x100000
>>>>>>> rman_reserve_resource_bound: <PCIe Memory&g= t; request: [0xc0000000, 0xc00fffff], length 0x100000, flags 102, device pc= ib1
>>>>>>> rman_reserve_resource_bound: trying 0xffffffff= <0xc0000000,0xfffff>
>>>>>>> considering [0xc0000000, 0xffffffff]
>>>>>>> truncated region: [0xc0000000, 0xc00fffff]; si= ze 0x100000 (requested 0x100000)
>>>>>>> candidate region: [0xc0000000, 0xc00fffff], si= ze 0x100000
>>>>>>> allocating from the beginning
>>>>>>> rman_manage_region: <pcib1 memory window>= ; request: start 0x600000000, end 0x6000fffff
>>>
>>> What you later typed does not match:
>>>
>>> 0x600000000
>>> 0x6000fffff
>>>
>>> You later typed:
>>>
>>> 0x60000000
>>> 0x600fffffff
>>>
>>> This seems to have lead to some confusion from using the
>>> wrong figure(s).
>>>
>>>>>> The fact that we are trying to reserve the CPU add= resses in the rman is because
>>>>>> bus_translate_resource rewrote the start address i= n the resource after it was allocated.
>>>>>>
>>>>>> That said, I can't see why rman_manage_region = would actually fail.=C2=A0 At this point the
>>>>>> rman is empty (this is the first call to rman_mana= ge_region for "pcib1 memory window"),
>>>>>> so only the check that should be failing are the c= hecks against rm_start and
>>>>>> rm_end.=C2=A0 For the memory window, rm_start is a= lways 0, and rm_end is always
>>>>>> 0xffffffff, so both the old (0xc00000000 - 0xc00ff= fff) and new (0x60000000 - 0x600fffffff)
>>>>>> ranges are within those bounds.
>>>
>>> No:
>>>
>>> 0xffffffff
>>>
>>> .vs (actual):
>>>
>>> 0x600000000
>>> 0x6000fffff

Ok, then this explains the failure if the "raw" addresses are abo= ve 4G.=C2=A0 I have
access to an emag I'm currently using to test fixes to pci_host_generic= .c to
avoid corrupting struct resource objects.=C2=A0 I'll post the diff once= I've got
something verified to work.

> It looks to me like in sys/dev/pci/pci_pci.c the:
>
> static void
> pcib_probe_windows(struct pcib_softc *sc)
> {
> . . .
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 pcib_alloc_window(sc, &sc->me= m, SYS_RES_MEMORY, 0, 0xffffffff);
> . . .
>
> is just inappropriately restrictive about where in the system
> address space a PCIe can validly be mapped to on the high end.
> That, in turn, leads to the rejection on the RPi4B now that
> the range use is checked.

No, the physical register in PCI-PCI bridges is only 32-bits.=C2=A0 Only th= e
prefetchable BAR supports 64-bit addresses.=C2=A0 This is why the host brid= ge
is doing a translation from the CPU side (0x600000000) to the PCI BAR
addresses (0xc0000000) so that the BAR addresses are down in the 32-bit
address range.=C2=A0 It's also true that many PCI devices only support = 32-bit
addresses in memory BARs.=C2=A0 64-bit BARs are an optional extension not universally supported.

The translation here is somewhat akin to a type of MMU where the CPU
addresses are mapped to PCI addresses.=C2=A0 The problem here is that the PCI BAR resources need to "stay" as PCI addresses since we depend= on
being able to use rman_get_start/end to get the PCI addresses of
allocated resources, but pci_host_generic.c currently rewrites the
addresses.

Probably I should remove rman_set_start/end entirely (Warner added them
back in 2004) as the methods don't do anything to deal with the fallout=
that the rman.rm_list linked-list is no longer sorted by address once
some addresses get rewritten, etc.

At t= he time, they made sense. Removing it, though may take some doing
since we use it in about 284 places=C2=A0 in sys/dev today... Somewhat mor= e
pervasive than I'd have thought they'd be...
=
Warner=C2=A0
--0000000000006a28f106115a33e8--