Constant load of 1 on a recent 12-STABLE

Tue Jun 23 07:23:52 UTC 2020

Hi Allan,

On Thu, Jun 04, 2020 at 02:37:20PM +0200, Gordon Bergling wrote:
> Hi Allan,
> 
> On Wed, Jun 03, 2020 at 05:33:37PM -0400, Allan Jude wrote:
> > On 2020-06-03 16:29, Gordon Bergling wrote:
> > > Hi Allan,
> > > 
> > > On Wed, Jun 03, 2020 at 03:13:47PM -0400, Allan Jude wrote:
> > >> On 2020-06-03 06:16, Gordon Bergling via freebsd-hackers wrote:
> > >>> since a while I am seeing a constant load of 1.00 on 12-STABLE,
> > >>> but all CPUs are shown as 100% idle in top.
> > >>>
> > >>> Has anyone an idea what could caused this?
> > >>>
> > >>> The load seems to be somewhat real, since the buildtimes on this
> > >>> machine for -CURRENT increased from about 2 hours to 3 hours.
> > >>>
> > >>> This a virtualized system running on Hyper-V, if that matters.
> > >>>
> > >>> Any hints are more then appreciated.
> > >>>
> > >> Try running 'top -SP' and see if that shows a specific CPU being busy,
> > >> or a specific process using CPU time
> > > 
> > > Below is the output of 'top -SP'. The only relevant process / thread that is
> > > relatively constant consumes CPU time seams to be 'zfskern'.
> > > 
> > > -----------------------------------------------------------------------------
> > > last pid: 68549;  load averages:  1.10,  1.19,  1.16 up 0+14:59:45  22:17:24
> > > 67 processes:  2 running, 64 sleeping, 1 waiting
> > > CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > > CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > > CPU 2:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
> > > CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> > > Mem: 108M Active, 4160M Inact, 33M Laundry, 3196M Wired, 444M Free
> > > ARC: 1858M Total, 855M MFU, 138M MRU, 96K Anon, 24M Header, 840M Other
> > >      461M Compressed, 1039M Uncompressed, 2.25:1 Ratio
> > > Swap: 2048M Total, 2048M Free
> > > 
> > >   PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
> > >    11 root          4 155 ki31     0B    64K RUN      0  47.3H 386.10% idle
> > >     8 root         65  -8    -     0B  1040K t->zth   0 115:39  12.61% zfskern
> > > -------------------------------------------------------------------------------
> > > 
> > > The only key performance indicator that is relatively high IMHO, for a 
> > > non-busy system, are the context switches, that vmstat has reported.
> > > 
> > > -------------------------------------------------------------------------------
> > > procs  memory       page                    disks     faults         cpu
> > > r b w  avm   fre   flt  re  pi  po    fr   sr da0 da1   in    sy    cs us sy id
> > > 0 0 0 514G  444M  7877   2   7   0  9595  171   0   0    0  4347 43322 17  2 81
> > > 0 0 0 514G  444M     1   0   0   0     0   44   0   0    0   121 40876  0  0 100
> > > 0 0 0 514G  444M     0   0   0   0     0   40   0   0    0   133 42520  0  0 100
> > > 0 0 0 514G  444M     0   0   0   0     0   40   0   0    0   120 43830  0  0 100
> > > 0 0 0 514G  444M     0   0   0   0     0   40   0   0    0   132 42917  0  0 100
> > > --------------------------------------------------------------------------------
> > > 
> > > Any other ideas what could generate that load?
> > 
> > I agree that load average looks out of place here when you look at the %
> > cpu idle, but I wonder if it is caused by a lot of short lived processes
> > or threads.
> > 
> > How quickly is the 'last pid' number going up?
> > 
> > You might also look at `zpool iostat 1` or `gstat -p` to see how busy
> > your disks are
> 
> In the IDLE state the last pid isn't changing within at least 60 seconds.
> During 'buildworld' times it is off course much shorter, but a "-j 4" is
> resulting in a load average from about 5.0, so that the underlying problem
> still persists. 'zpool iostat 1' and 'gstat -p' doesn't show anything 
> suspicious.
> 
> I had a private mail that made me aware of PR173541, where this problem is
> documentated. I'll add my hardware information and performance measurements 
> to it when I find some time.
> 
> I am currently thinking about how to measure the spawned threads/s. Did you
> have an idea how to do it?

What is maybe also related to the problem is the following top output, 
collected by 'top -HS'. The three zfskern threads generate little load, but that
constant. The annoying part of the problem is, that the load of 1 is reported and
that the hostsystem schedules the thread over all 4 physical cores and keeps the
clockrate at the highest frequency.

  PID USERNAME    PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
   11 root        155 ki31     0B    64K CPU2     2  17:28  97.39% idle{idle: cpu2}
   11 root        155 ki31     0B    64K CPU3     3  17:29  96.78% idle{idle: cpu3}
   11 root        155 ki31     0B    64K CPU1     1  17:29  96.40% idle{idle: cpu1}
   11 root        155 ki31     0B    64K RUN      0  17:25  96.13% idle{idle: cpu0}
    8 root         -8    -     0B  1040K mmp->m   2   0:44   4.32% zfskern{mmp_thread_enter}
    8 root         -8    -     0B  1040K mmp->m   1   0:44   4.28% zfskern{mmp_thread_enter}
    8 root         -8    -     0B  1040K mmp->m   3   0:44   4.25% zfskern{mmp_thread_enter}

Does anyone has some insight where zfskern is spawned? I maybe can bisect then the resulting change
which leads to this behaviour.

--Gordon