Re: -current on armv7 stuck with flashing disk light

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Tue, 27 Jun 2023 16:29:03 UTC
On Mon, Jun 26, 2023 at 07:57:05PM -0700, Mark Millard wrote:
> On Jun 26, 2023, at 19:12, bob prohaska <fbsd@www.zefox.net> wrote:
> 
> > A Pi2 freshly updated to 
> > FreeBSD 14.0-CURRENT #41 main-c3e58ace31: Mon Jun 26 17:06:01 PDT 2023
> >    bob@www.zefox.com:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm
> > got stuck with a flashing USB disk LED after starting a -j3 buildworld.
> > No response to debugger escape, had to pull the plug.
> 
> If I understand right, the LED flashing means the disk
> had not stopped doing I/O: the system was still running,
> doing disk activity. (But I do not have a description
> of what your drive documentation says about how the
> drive handles the LED and what various patterns/colors
> may mean.)
> 
> If the processes associated with processing input that
> would identify the debugger escape had the kernel stacks
> involved swapped out to swap space, I doubt that the
> debugger escape would work until/unless the kernel
> stacks are brought back into kernel RAM.
> 
> Avoiding the specific way of losing control is why I
> have in /etc/sysctl.conf :
> 
> #
> # Together this pair avoids swapping out the process kernel stacks.
> # This avoids processes for interacting with the system from being
> # hung-up by such.
> vm.swap_enabled=0
> vm.swap_idle_enabled=0
>
 
This combination was tried and didn't seem to have any consistent
effect. It's commented out at the moment.

> (No claim such is the only way to lose control.)
> 
> You might be able to get a clue if their was disk I/O going
> on based on modification times on files you know would have
> been modified periodically for some time (minutes) before
> you pulled the plug --but not modified on reboot and later
> activity. May be a log file that would only be modified by
> the build that you had been trying to do?
> 

There are log files for build and disk activity (for a cold
hang, no disk activity at all) at
http://nemesis.zefox.com/~bob/fbsd/rpi2/20230623/

In this case the top window was via ssh. Lately I've
taken to running top on the serial console in hopes
that will help distinguish system hangs from USB hangs.

 

> (You did not indicate how long you let it run with the
> status "possibly hung up".)
>
IIRC it was about half an hour. It was already stuck, so I
don't know the actual time
 
> > Reboot with kernel.old,
> > FreeBSD 14.0-CURRENT #40 main-c1cbabe8ae: Tue Jun 20 03:58:47 PDT 2023
> >    bob@www.zefox.com:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm
> > seems ok, I'll try to run buildworld with that.

The kernel.old  -j3 buildworld is still running, no complaints so far.
If it succeeds I'll experiment with usbtop.

Thanks for writing!