[Bug 270975] [hang] system hangs with heavy io and regular syncing

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 21 Apr 2023 08:53:31 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270975

            Bug ID: 270975
           Summary: [hang] system hangs with heavy io and regular syncing
           Product: Base System
           Version: 12.4-RELEASE
          Hardware: i386
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: nkoch@demig.de

We are running an embedded system with processes partly at rtprio doing io and
often syncing. We have there a 12.1-p13 kernel with some modifications,
including special device drivers, e.g. an sram based disk device.
As we rarely found the system to be completely unresponsive (no login, ssh
possible) I added a utility that runs at very high rtprio and monitors the
other processes. If it sees 100% cpu it throttles those processes using
SIGSTOP/SIGCONT. That helped me to see that there was a thread in unkillable
sleep in syscall sync using up most of the cpu, like it was busy waiting for
something.

After that I did some testing with unmodified kernels (withoud my device
drivers) and simple test scripts that do write+sync+random sleep at normal
and realtime priority.

So far I've tested FreeBSD12.1-release, FreeBSD12.1-p13, FreeBSD12.4-release.
I've managed to have all of them bein unresponsive after one or more hours.

For FreeBSD12.4, I've had one console running a shell with rtprio. After
"killall sync &" and
"killall sh & " the system was hanging. I could only switch vtys but could not
login.
In an other test I've got the hang by calling "procstat kstack 1" in the rtprio
shell.

One detail: kern.dirdelay, kern.metadelay and kern.filedelay are all set to 1.

-- 
You are receiving this mail because:
You are the assignee for the bug.