Re: u-boot debug, was: Re: U-boot on RPI3, sees disk but won't boot it
Date: Fri, 30 Sep 2022 02:14:44 UTC
On 2022-Sep-29, at 17:27, bob prohaska <fbsd@www.zefox.net> wrote: > On Thu, Sep 29, 2022 at 12:55:00PM -0700, Mark Millard wrote: >> On 2022-Sep-29, at 10:09, bob prohaska <fbsd@www.zefox.net> wrote: >> >>> . . . >> >> In part I was going by your indication to avoid "bloat", >> which I had guessed was a reference to the logging output >> instead of the size of u-boot.bin : >> >> QUOTE >> The maximum logging level was 4 out of 7, fearing bloat. >> END QUOTE > > No, actually I was then worried about getting a u-boot binary > too large to let me keep a few usable versions in /boot/msdos. > I had at the time absolutely no concept of output volume. Ahh, okay. It turns out your capture of an automatic reset needed the read activity details. It has: . . . 6 USB Device(s) found scanning usb for storage devices... 1 Storage Device(s) found U-Boot> run bootcmd_usb0 Device 0: usb_read: udev 0 usb_read: dev 0 startblk 0, blccnt 1 buffer 3af42c00 read10: start 0 blocks 1 COMMAND phase DATA phase usb_bulk_msg error status 0 BBB_reset usb_control_msg: request: 0xFF, requesttype: 0x21, value 0x0 index 0x0 length 0x0 BBB_reset result -110: status 0 reset usb_control_msg: request: 0x1, requesttype: 0x2, value 0x0 index 0x81 length 0x0 BBB_reset result -22: status 0 clearing IN endpoint usb_control_msg: request: 0x1, requesttype: 0x2, value 0x0 index 0x2 length 0x0 BBB_reset result -22: status 0 clearing OUT endpoint BBB_reset done Read ERROR COMMAND phase DATA phase Raspberry Pi Bootcode Read File: config.txt, 265 Read File: start.elf, 2952960 (bytes) Read File: fixup.dat, 7314 (bytes) MESS:00:00:21.193325:0: brfs: File read: /mfs/sd/config.txt MESS:00:00:21.197789:0: brfs: File read: 265 bytes . . . So, apparently, the RPi3B (not just USB) reset during a phase of a Read's activity over the USB bus. It never got as far as reporting the STATUS phase of the read. I've no clue if it is some form of watchdog based reset or a power based reset. But one of the two seem likely. I'm not familiar with the result codes, endpoint clearing, or the like. But at some point I'll try to figure out what -110 and -22 have for names. That might give a clue. >> . . . >> > > There might be another reason for making changes. Apart from the > failure captured last night and posted, every graceful shutdown -r > has succeeded. The count is now 22. It's understood that debugging > code can alter the behavior of bugs, but I didn't expect it to > _fix_ an apparent bug. It's up to 22 successful reboots in a row. Serial console output takes signficant time and can significantly change the timing of what is before vs. after the output request in the code. The extra time can make various type of problems either more or less likely to be observed. > Ah, until now. The 23rd reboot got stuck at the outset, with > Syncing disks, vnodes remaining... Waiting (max 60 seconds) for system process `syncer' to stop... 2 1 0 0 done > All buffers synced. > Uptime: 1m16s > Resetting system ... > and then nothing. Plug-pulling time. > Came up just fine. > > Is it possible the append a change to patch file > names, like "no.verbose.patch-common_usb__storage.c" as > a way to de-activate them with minimal typing? If it > won't fail to detect the disk some kind of incremental > backout of the diagnostics looks necessary, or at least > useful. One way to deactivate patch files that are not explicit in the Makefile is to move them one level up into ../ ( so out of files/ ), as I understand. Another way is to change names so they do not start with: patch- as I understand. >> >> On a different issue . . . >> >> I'm also hoping that you will happen to also capture >> examples of automatic reset/start-over sequences. It >> may be that those requiring getting to the U-Boot prompt >> and typing commands. (Has it ever happened otherwise?) >> > Are you referring to cases where "assisted" boots fail? > > There have been quite a few cases where on reboot u-boot > finds zero storage devices, but usb reset finds one device. > Typing run bootcmd_usb0 then results in a pause, a re-run > of u-boot and a successful boot. The explicit evidence so far is that "a re-run of u-boot" is better described as "the RPi3B rebooting", starting with execution of the RPi3B firmware and later again getting to the stage of u-boot.bin being loaded and started. > Sometimes the machine > silently hangs with the Device 0: message. Sometimes it > boots. > > I just captured an example now. Because of the file's > size it's at > http://nemesis.zefox.com/~fbsd/ > in a file named > > pelorus_console.txt > > By my count the interesting part will be in the last 10% of the > ~9 MB file. > > For some reason the file is littered with non-ascii characters. So far it looks like a binary capture that has every byte sent out in it, including escape characters (ASCII 27 decimal) that start escape sequences for controlling a display. Also, every Carriage Return (ASCII 13 decimal) and every line feed is explicit. There is some text at the beginning (after the first "login:" that looks to be junk so all the following bytes up to the "Password:" after the junk should probably be replaced by one newline or by nothing. There is a sequence (using ESC to indicate an escape character): ESC[?25h that occurs 73225 times in the file. In my normal binary captures of one boot sequence I get somewhat over 4000 of them as I remember. There is interesting text between the sequences, commonly one letter of some overall text. I delete all instances of the escape sequence to read the text with the letters next to each other. The delete is via a find-and-replace-all that replaces them with an empty string. There are 522 more escape characters that were not the beginning of that specific escape sequence. I've not analyzed all the sequences so deleting the 522 ESC might leave some not-normally-displayed text around. The Carriage Returns, 344121 of them, can be similarly removed. There are some ASCII 8 (decimal) characters in the file as well, 1348 of them. ASCII 9 (decimal), 377 of them. I think that is all the characters that are less than a space in the ASCII encoding. But I use some text editors that are good at dealing with such content (but I do not normally deal with such via the headless or plain console FreeBSD machines). > It can be viewed and searched with more, but the chrome browser > is only willing to download it. Attempts to view it fail with > "invalid byte sequence in conversion input". I'm posting it > anyway because it's a rare event. If you've got any ideas for > cleaning it up so it'll display in a browser please say so. > Script seemed to work fine for the first capture, maybe the > huge size is the issue. Perhaps split would help, but I hope > there's something better. === Mark Millard marklmi at yahoo.com