Re: FYI: RPi* firmware tagged 1.20210805 appears to be the last to be bootable by FreeBSD via fdt use; sequence of 2 failure modes after that

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 28 Apr 2022 06:47:37 UTC
[Just an FYI: I got ahold of the RPi3B and discovered that
it was not bootable via RPi* firmware tagged 1.20210805 .
In fact it barely produced any output on the serial console:
very early failure. Reverting to the prior one, 1.20210727,
worked for the RPi3B and the RPi4B.]

[I've not added to the below and have removed the long text
block of RPi4B boot failure output.]

On 2022-Apr-24, at 05:36, Mark Millard <marklmi@yahoo.com> wrote:

> [I may have also found what leads to the extra messages for
> the 2nd failure mode, an independent issue it turns out.]
> 
> On 2022-Apr-24, at 04:37, Mark Millard <marklmi@yahoo.com> wrote:
> 
>> [I think I found the reason for the boot crash that is
>> a common failure to both failure modes. The 2nd mode
>> has other issues I've not analyzed.]
>> 
>> On 2022-Apr-23, at 23:45, Mark Millard <marklmi@yahoo.com> wrote:
>> 
>>> The following is based on a microsd card with 13.1-RC4 on
>>> it were I'd previously substituted my U-Boot 2022.04 build
>>> and tested with the RPi* firmware that is in the 13.1-RC4
>>> image. Here I've tried replacing the RPi* firmware and
>>> holding the rest constant.
>>> 
>>> The boot tests are on a 8 GiByte RPi4B Rev 1.14 with the
>>> B0T stepping. I've not been copying over the linux kernels,
>>> which they also bundle with the firmware.
>>> 
>>> [13.1-RC4 is just what I happened to use. I doubt anything
>>> here is special to 13.* or stable/13 or main [so: 14].
>>> (I do not use 12.* or stable/12.)]
>>> 
>>> The observed status went like . . .
>>> 
>>> 
>>> firmware-1.20210805/boot/
>>> 
>>> The RPi* release tagged 1.20210805 is the last version that
>>> FreeBSD booted with. (Other than booting, logging in, and
>>> shutting down, I've not been testing other aspects of
>>> operation.)
>>> 
>>> From what I've read, firmware-1.20210805/boot/ should be
>>> recent enough to handle the Rev 1.15 related PMIC variation.
>>> 
>>> [I'll note that firmware build dates need not be the same day
>>> as the date encoded into the tag --in fact it is usually some
>>> earlier day. On rare occasion it can be a lot earlier, and
>>> there is an example of that below.]
>>> 
>>> 
>>> After firmware-1.20210805 there are 2 major failure modes.
>>> Both stop at the same sort of point in the messaging --but
>>> there is a huge difference in the count of earlier error
>>> messages. It looks to me like all the issues require
>>> FreeBSD changes if modern RPi* firmware/dtb's are to be
>>> usable via fdt.
>> 
>> I've noticed a difference between the working context and
>> the failing ones (both failure modes).
>> 
>> Failing:
>> 
>> spi0: <BCM2708/2835 SPI controller> mem 0x7e204000-0x7e2041ff irq 18 on simplebus0
>> spibus0: <OFW SPI bus> on spi0
>> spibus0: <unknown card> at cs 0 mode 0
>> spibus0: <unknown card> at cs 1 mode 0
>> NOTE BELOW LINES MISSING HERE.
>> sdhci_bcm0: <Broadcom 2708 SDHCI controller> mem 0x7e300000-0x7e3000ff irq 24 on simplebus0
>> 
>> Working:
>> 
>> spi0: <BCM2708/2835 SPI controller> mem 0x7e204000-0x7e2041ff irq 18 on simplebus0
>> spibus0: <OFW SPI bus> on spi0
>> spibus0: <unknown card> at cs 0 mode 0
>> spibus0: <unknown card> at cs 1 mode 0
>> START LINES MISSING ABOVE
>> iichb0: <BCM2708/2835 BSC controller> mem 0x7e804000-0x7e804fff irq 26 on simplebus0
>> bcm_dma0: <BCM2835 DMA Controller> mem 0x7e007000-0x7e007aff irq 30,31,32,33,34,35,36,37,38,39,40 on simplebus0
>> bcmwd0: <BCM2708/2835 Watchdog> mem 0x7e100000-0x7e100113,0x7e00a000-0x7e00a023,0x7ec11000-0x7ec1101f on simplebus0
>> bcmrng0: <Broadcom BCM2835/BCM2838 RNG> mem 0x7e104000-0x7e104027 on simplebus0
>> gpioc1: <GPIO controller> on gpio1
>> END LINES MISSING ABOVE
>> sdhci_bcm0: <Broadcom 2708 SDHCI controller> mem 0x7e300000-0x7e3000ff irq 73 on simplebus0
>> 
>> In particular:
>> 
>> bcm_dma0: <BCM2835 DMA Controller> mem 0x7e007000-0x7e007aff irq 30,31,32,33,34,35,36,37,38,39,40 on simplebus0
>> 
>> being missing means no bcm_dma_attach and that in turn means
>> that the static bcm_dma_sc == NULL still.
>> 
>> The panic was: panic: vm_fault failed: ffff000000862134
>> 
>> where:
>> 
>> ffff000000862134 <bcm_dma_allocate+0x88> ldaxr  x1, [x9]
>> 
>> which is part of:
>> 
>> int
>> bcm_dma_allocate(int req_ch)
>> {
>>       struct bcm_dma_softc *sc = bcm_dma_sc;
>>       int ch = BCM_DMA_CH_INVALID;
>>       int i;
>> 
>>       if (req_ch >= BCM_DMA_CH_MAX)
>>               return (BCM_DMA_CH_INVALID);
>> 
>>       /* Auto(req_ch < 0) or CH specified */
>>       mtx_lock(&sc->sc_mtx);
>> . . .
>> 
>> So the likes of &sc->sc_mtx end up being a small offset
>> from address zero:
>> 
>> x9:               20
>> 
>> Thus the panic.
>> 
>> As to how bcm_dma_allocate happened without bcm_dma_attach
>> happening first . . .
>> 
>> The working context's dtb has the ordering:
>> (I also show mmcnr@ and the brcm,bcm2711-dma
>> just for reference.)
>> 
>>               dma@7e007000 {
>>                       compatible = "brcm,bcm2835-dma";
>> . . .
>>               mmc@7e300000 {
>>                       compatible = "brcm,bcm2835-mmc", "brcm,bcm2835-sdhci";
>> . . .
>>               mmcnr@7e300000 {
>>                       compatible = "brcm,bcm2835-mmc", "brcm,bcm2835-sdhci";
>> . . .
>>               dma@7e007b00 {
>>                       compatible = "brcm,bcm2711-dma";
>> 
>> But the failing context's dtb has the ordering:
>> (I also show mmcnr@ and the brcm,bcm2711-dma
>> just for reference.)
>> 
>>               mmc@7e300000 {
>>                       compatible = "brcm,bcm2835-mmc", "brcm,bcm2835-sdhci";
>> . . .
>>               dma@7e007000 {
>>                       compatible = "brcm,bcm2835-dma";
>> . . .
>>               mmcnr@7e300000 {
>>                       compatible = "brcm,bcm2835-mmc", "brcm,bcm2835-sdhci";
>> . . .
>>               dma@7e007b00 {
>>                       compatible = "brcm,bcm2711-dma";
>> 
>> So, for sequential handling in the failing case, the dma@7e007000
>> would use bcm_dma_allocate before the bcm_dma_probe/bcm_dma_attach
>> sequence had happened, leading to the crash.
>> 
>> Note: I used "fdt print /" from U-Boot to get the dtb and its
>> ordering. This was based on the address that the RPi* firmware
>> reports when debugging output is enabled (0x4000 here).
>> 
>> 
>>> The 1st mode happens for (I've added the -fails notation):
>>> 
>>> firmware-1.20210831-fails/boot/
>>> firmware-1.20210928-fails/boot/
>>> firmware-1.20211007-fails/boot/
>>> firmware-1.20211029-fails/boot/
>>> firmware-1.20211118-fails/boot/
>>> firmware-1.20220308_buster-fails/boot/
>>> (The _buster one has firmware from 2021-Dec-01, which
>>> is before all the tagged releases listed below.
>>> It looks like the switch to the new major kernel
>>> version after buster came with other changes that
>>> FreeBSD has not tracked.)
>>> 
>>> 
>>> The 2nd mode happens for the following. (Again with extra
>>> notation.) There are a lot more error messages before the
>>> panic happens for these. The firmware builds for these
>>> are more recent than for the above list.
>>> 
>>> 
>>> firmware-1.20220118-fails/boot/
>>> 
>>> firmware-1.20220120-fails/boot/
>>> firmware-1.20220308-fails-non-kernels-same-as-1.20220120/boot/
>>> (I did not repeat the testing of the unchanged firmware.
>>> I just did the "diff -r" to discover the lack of change.)
>>> 
>>> firmware-1.20220328-fails/boot/
>>> firmware-1.20220331-fails-non-kernels-same-as-firmware-1.20220328-but-for-bcm2711-dtb-files/boot/
>>> (Since the .dtb for the RPi4B was different, I did test this.)
> 
> It looks like the extra messages, blocks of:
> 
> clk_fixed4: <Fixed clock> disabled on ofwbus0
> clk_fixed4: Cannot FDT parameters.
> device_attach: clk_fixed4 attach returned 6
> 
> Are tied to new dtb content in 2022's dtb updates:
> 
>        cam1_clk {
>                compatible = "fixed-clock";
>                #clock-cells = <0x00000000>;
>                status = "disabled";
>                phandle = <0x000000e2>;
>        };
> . . .
>        cam0_clk {
>                compatible = "fixed-clock";
>                #clock-cells = <0x00000000>;
>                status = "disabled";
>                phandle = <0x000000e4>;
>        };
> 
> These 2 did not exist back when the 1st failure mode
> started. They appear to be repeatedly processed from
> not really being handled --leading to lots of
> messages.
> 
> The messages may just be noise for activity that is
> not contributing to boot failures at all. So fixing
> what I called the 1st failure mode might actually fix
> booting for all the firmware versions after the
> version tagged 1.20210805 .
> 
>>> The failures look like (each test shown) . . .
>>> 
>>> 
>>> . . .
>> 






===
Mark Millard
marklmi at yahoo.com