Re: -current on armv7 stuck with flashing disk light
Date: Tue, 27 Jun 2023 16:59:40 UTC
On Jun 27, 2023, at 09:47, Mark Millard <marklmi@yahoo.com> wrote: > On Jun 27, 2023, at 09:29, bob prohaska <fbsd@www.zefox.net> wrote: > >> On Mon, Jun 26, 2023 at 07:57:05PM -0700, Mark Millard wrote: >>> On Jun 26, 2023, at 19:12, bob prohaska <fbsd@www.zefox.net> wrote: >>> >>>> A Pi2 freshly updated to >>>> FreeBSD 14.0-CURRENT #41 main-c3e58ace31: Mon Jun 26 17:06:01 PDT 2023 >>>> bob@www.zefox.com:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm >>>> got stuck with a flashing USB disk LED after starting a -j3 buildworld. >>>> No response to debugger escape, had to pull the plug. > > I'm confused. > > That says "stuck with a flashing USB disk LED". But: > > http://nemesis.zefox.com/~bob/fbsd/rpi2/20230623/readme > > says: "the disk had gone to sleep mode. Both LEDs were off" > > Are these two different examples with variable behavior > across the examples? > >>> If I understand right, the LED flashing means the disk >>> had not stopped doing I/O: the system was still running, >>> doing disk activity. (But I do not have a description >>> of what your drive documentation says about how the >>> drive handles the LED and what various patterns/colors >>> may mean.) >>> >>> If the processes associated with processing input that >>> would identify the debugger escape had the kernel stacks >>> involved swapped out to swap space, I doubt that the >>> debugger escape would work until/unless the kernel >>> stacks are brought back into kernel RAM. >>> >>> Avoiding the specific way of losing control is why I >>> have in /etc/sysctl.conf : >>> >>> # >>> # Together this pair avoids swapping out the process kernel stacks. >>> # This avoids processes for interacting with the system from being >>> # hung-up by such. >>> vm.swap_enabled=0 >>> vm.swap_idle_enabled=0 >>> >> >> This combination was tried and didn't seem to have any consistent >> effect. It's commented out at the moment. > > By not having them, we have no way to know if the > relevant kernel stacks had been moved to swap space. > Having them is part of problem isolation/identification > even when other forms of loss of control happen. > > The 2 lines serve more than one goal. > >>> (No claim such is the only way to lose control.) >>> >>> You might be able to get a clue if their was disk I/O going >>> on based on modification times on files you know would have >>> been modified periodically for some time (minutes) before >>> you pulled the plug --but not modified on reboot and later >>> activity. May be a log file that would only be modified by >>> the build that you had been trying to do? >>> >> >> There are log files for build and disk activity (for a cold >> hang, no disk activity at all) at >> http://nemesis.zefox.com/~bob/fbsd/rpi2/20230623/ > > So this is a different hangup? j4swapscript.log has internal timestamp pairs: Wed Jun 21 16:34:06 PDT 2023 . . . Fri Jun 23 07:26:10 PDT 2023 It would be interesting to know if "Jun 23 07:26:10" was after the appearent hangup was identified vs. before. >> In this case the top window was via ssh. Lately I've >> taken to running top on the serial console in hopes >> that will help distinguish system hangs from USB hangs. > > If you want to identify system hangs, please > put back: > > vm.swap_enabled=0 > vm.swap_idle_enabled=0 > > otherwise all you may be seeing is the relevant > kernel stacks having been moved to swap space. > That is not a form of system hang relative to > overall activity, leaving more uncertainty about > what top no longer displaying updates implies. > > You can use sysctl to adjust the live context > as well. > >> >>> (You did not indicate how long you let it run with the >>> status "possibly hung up".) >>> >> IIRC it was about half an hour. It was already stuck, so I >> don't know the actual time > > No logs or other files with modification times that > might indicate if there was activity during that > around 0.5 hr? (Timestamps in files can also serve.) > >>>> Reboot with kernel.old, >>>> FreeBSD 14.0-CURRENT #40 main-c1cbabe8ae: Tue Jun 20 03:58:47 PDT 2023 >>>> bob@www.zefox.com:/usr/obj/usr/src/arm.armv7/sys/GENERIC arm >>>> seems ok, I'll try to run buildworld with that. >> >> The kernel.old -j3 buildworld is still running, no complaints so far. >> If it succeeds I'll experiment with usbtop. === Mark Millard marklmi at yahoo.com