Re: u-boot debug, was: Re: U-boot on RPI3, sees disk but won't boot it

From: Klaus_Küchemann <maciphone2_at_googlemail.com>
Date: Sun, 02 Oct 2022 20:18:28 UTC

> Am 02.10.2022 um 21:58 schrieb Mark Millard <marklmi@yahoo.com>:
> 
> On 2022-Oct-2, at 12:35, Klaus Küchemann <maciphone2@googlemail.com> wrote:
> 
>> Am 02.10.2022 um 20:20 schrieb bob prohaska <fbsd@www.zefox.net>:
>>> 
>>> On Sat, Oct 01, 2022 at 02:21:42PM -0700, Mark Millard wrote:
>>>> 
>>>> http://nemesis.zefox.com/~fbsd/pelorus_console.txt7_orig_fragment
>>>> 
>>>> still shows all the debug output. It did not
>>>> avoid the timing changes.
>>>> 
>>>> You might need to not use either of:
>>>> 
>>>> patch-common_usb__hub.c
>>>> patch-common_usb__storage.c
>>>> 
>>>> and to disable the LOG_DEBUG and DEBUG lines in:
>>>> 
>>>> patch-common_usb.c
>>>> 
>>>> via turning them into comments by adding // as
>>>> indicated below:
>>>> 
>>>> +//#define LOG_DEBUG
>>>> +//#define DEBUG
>>>> 
>>> 
>>> I think the changes were successful, u-boot compiles and
>>> runs. There's no extra output, and unfortunately only one 
>>> successful reboot so far. Bus scanning seems quite slow.
>>> Storage devices are rarely found on reset, but usb reset
>>> does sometimes work. Run bootcmd_usb0 paused for minutes
>>> at Device 0: and paused again after reporting ..current device.
>>> No echo from the console, ctrl-C did nothing. 
>>> 
>>> The attempt sequence was
>>> SRBSPRMRPRRPUPPRRUPUCUUC
>>> where 
>>> S is shutdown -r
>>> R is reset of u-boot
>>> U is usb reset
>>> P is powercycle
>>> M is stop at mountroot
>>> C is run bootcmd_usb0
>>> 
>>> The console log is at
>>> http://nemesis.zefox.com/~fbsd/pelorus_console.txt8_no_debug
>>> 
>>> It now appears that the run bootcmd_usb0 rather reliably gets
>>> stuck, with the disk LED on steadily (no activity). Maybe in
>>> one of the loops seen earlier? 
>>> 
>>> Thanks again for all your help!
>>> 
>>> bob prohaska
>>> 
>> 
>> 
>> So if you now reapply the  #define DEBUG  patches(while keeping the mdelay-patch) and the reboot issues definitely went away
>> we have a typical so called Heisenbug, hopefully more or less  now a fixed issue.
> 
> No. Bob has more than one problem: more problems observed
> after "1 Storage Device(s) found". The DEBUG/mdelay
> combination only seemed to cause the "1 Storage Device(s)
> found" to be at least more reliable, not later stages.
> 
> It is not obvious if earlier activity contributes or not
> to the problems observed after "1 Storage Device(s) found".
> 
> So far nothing has gotten near having things just work for
> booting without manual intervention, multiple retries
> being involved sometimes.
> 
>> Well, USB-boot problems on earlier Pi models( afaik all except the 4) are commonly known, from defective HW to power cycle issues we will find a lot of discussions on the WWW and we will see that even the debug-message „is your USB cable bad?“  did fix issues in some cases. Others applied RNG devices or external clock or even plugging a mouse fixed it( to change usb enumeration).
>> 
>> I think with the working u-boot.bin after 1500 successful reboots you can be sure it’s working ….
>> just kidding… :-)
>> 
> 
> 
> ===
> Mark Millard
> marklmi at yahoo.com

hard to read and remember every log but I thought Bob wrote about aprox. 30 successful reboots after the mdelay patch,
while of course that could be coincidence, who really knows what happens in this untrackable inconsistent behavior of the usb-boot?!

> Am 02.10.2022 um 21:48 schrieb Mark Millard <marklmi@yahoo.com>:
> 
> (RaspiOS and Ubuntu do not use U-Boot last I knew. So
> they do not make for good comparisons for the purpose
> as far as I know.)

RaspiOS doesn’t , Ubuntu(and others) use u-boot since years …
while possible Ubuntu(or others) have own u-boot patches ,
from guessing it seems more probable that they also will sometimes hang after (re)boot.

If I would want to keep such a device as an online server, like Bob does, for whatever reason I would 
Implement something like an „IPMI“ or simpler said:
An immediate console remote access after being warned by a script that the machine is offline.
But I would remove it from cluster if there are known Hardware problems. 


Regards

Klaus