Re: rock64 verbose boot hangs

From: Andriy Gapon <avg_at_FreeBSD.org>
Date: Wed, 29 Sep 2021 17:07:25 UTC
On 23/09/2021 20:46, Andriy Gapon wrote:
> On 20/09/2021 20:02, Emmanuel Vadot wrote:
>>
>>   Hi Andriy,
>>
>> On Sat, 18 Sep 2021 15:58:00 +0300
>> Andriy Gapon <avg@FreeBSD.org> wrote:
>>
>>>
>>> Normal boot works every time, but with boot_verbose="YES" it hanged on all
>>> attempts so far.
>>>
>>> Last messages on the console:
>>> cpulist0: <Open Firmware CPU Group> on ofwbus0
>>> cpu0: <Open Firmware CPU> on cpulist0
>>> cpu0: Nominal frequency 600Mhz
>>> cpufreq_dt0: <Generic cpufreq driver> on cpu0
>>> cpufreq_dt0: 408.000 Mhz (950000 uV)
>>> cpufreq_dt0: 600.000 Mhz (950000 uV)
>>> cpufreq_dt0: 816.000 Mhz (1000000 uV)
>>> cpufreq_dt0: 1008.000 Mhz (1100000 uV)
>>> cpufreq_dt0: 1200.000 Mhz (1225000 uV)
>>> cpufreq_dt0: 1296.000 Mhz (1300000 uV)
>>> cpu1: <Open Firmware CPU> on cpulist0
>>> cpu1: Nominal frequency 600Mhz
>>> cpufreq_dt1: <Generic cpufreq driver> on cpu1
>>>
>>> The kernel is totally unresponsive after that.
>>
>>   Can't reproduce here, I'm running 548a706608d with latest DTB and
>> latest u-boot/atf
>>
>>> Any suggestions on how to debug this?
>>
>>   Not really sure how to start, that seems weird that the kernel will
>> hang at the cpufreq attach but maybe try modifying the DTB to remove
>> this node ?
>>   Also did that happens with my recent commit on clock or was this the
>> same before ?

An update relevant to the question above.
Actually, after upgrading to a version that includes your clock changes the 
problem went away!
I don't know what to make out of this fact, but it looks like the problem was a 
clock plus timing issue.

> Thank you and every one else who responded with information and suggestions.
> 
> Some extra details.
> I've been having this problem since I've got this board 9 months ago.
> It's been through several FreeBSD and U-Boot and stuff in the ESP partition 
> upgrades.  And the problem was always present.
> 
> Now I've done more extensive testing with a couple of dozen reboots in a row and 
> some additional debug prints (like, for example, DEBUG in subr_bus.c).
> 
> I actually see several variations of the problem.
> Sometimes it's a hang, but sometimes it's a crash.
> A hang can happen in different places and a crash can happen in different places 
> too.
> Some crashes happens during AP startup and the information I am getting is not 
> very usable.
> Some crashes happen during a driver probing when the bus code searches the hints 
> memory space.  Those crashes look like a memory corruption happens there at random.
> 
> Given those variations plus some other differences that I have comparing to 
> other Rock64 users (like needing special setup for eMMC and for the watchdog), I 
> am inclined to think that the board I have has something special either in the 
> hardware (like a different configuration via some fuses) or in the BootROM.
> Even though the PCB has the standard markings.
> 
> And I would not be surprised about that (that it could be a customized 
> production) as I got my Rock64-s via a special / unusual deal on Amazon. 
> Iconikal and Recon Sentinal are keywords to search for, for those interested.
> Some news articles from the time:
> https://liliputing.com/2020/09/this-10-single-board-computer-is-faster-than-a-raspberry-pi-3.html 
> 
> https://www.tomshardware.com/news/raspberry-pi-sized-iconikal-rockchip-sbc-only-dollar8-on-amazon 
> 
> 
> So, in the end, I still do not know what causes the verbose boot to hang / crash.
> Maybe there is some (not fully working) watchdog that gets armed and disarmed by 
> some hardware accesses and the verbose boot is too slow to complete in time.
> 
> Here is a small subset of panics and hangs that I saw:
> https://people.freebsd.org/~avg/rock64-verbose-boot-panic.txt
> 


-- 
Andriy Gapon