Re: Periodic rant about SCHED_ULE

From: Steve Kargl <sgk_at_troutmask.apl.washington.edu>
Date: Wed, 22 Mar 2023 19:04:06 UTC
On Wed, Mar 22, 2023 at 07:31:57PM +0100, Matthias Andree wrote:
> 
> Yes, there are reports that FreeBSD is not responsive by default - but this
> may make it get overall better throughput at the expense of responsiveness,
> because it might be doing fewer context switches.  So just complaining about
> a longer buildworld without seeing how much dnetc did in the same wallclock
> time period is useless.  Periodic rant's don't fix this lack of information.
> 

I reported the issue with ULE some 15 to 20 years ago.
I gave up reporting the issue.  The individuals with the
requisite skills to hack on ULE did not; and yes, I lack
those skills.  The path of least resistance is to use
4BSD.

%  cat a.f90
!
! Silly numerically intensive computation.
!
program foo
   implicit none
   integer, parameter :: m = 200, n = 1000, dp = kind(1.d0)
   integer i
   real(dp) x
   real(dp), allocatable :: a(:,:), b(:,:), c(:,:)
   call random_init(.true., .true.)
   allocate(a(n,n), b(n,n))
   do i = 1, m
      call random_number(a)
      call random_number(b)
      c = matmul(a,b)
      x = sum(c)
      if (x < 0) stop 'Whoops'
   end do
end program foo
% gfortran11 -o z -O3 -march=native a.f90 
% time ./z
       42.16 real        42.04 user         0.09 sys
% cat foo
#! /bin/csh
#
# Launch NCPU+1 images with a 1 second delay
#
foreach i (1 2 3 4 5 6 7 8 9)
   ./z &
   sleep 1
end
% ./foo

In another xterm, you can watch the 9 images.

% top
st pid:  1709;  load averages:  4.90,  1.61,  0.79    up 0+00:56:46  11:43:01
74 processes:  10 running, 64 sleeping
CPU: 99.9% user,  0.0% nice,  0.1% system,  0.0% interrupt,  0.0% idle
Mem: 369M Active, 187M Inact, 240K Laundry, 889M Wired, 546M Buf, 14G Free
Swap: 16G Total, 16G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME     CPU COMMAND
 1699 kargl         1  56    0    68M    35M RUN      3   0:41  92.60% z
 1701 kargl         1  56    0    68M    35M RUN      0   0:41  92.33% z
 1689 kargl         1  56    0    68M    35M CPU5     5   0:47  91.63% z
 1691 kargl         1  56    0    68M    35M CPU0     0   0:45  89.91% z
 1695 kargl         1  56    0    68M    35M CPU2     2   0:43  88.56% z
 1697 kargl         1  56    0    68M    35M CPU6     6   0:42  88.48% z
 1705 kargl         1  55    0    68M    35M CPU1     1   0:39  88.12% z
 1703 kargl         1  56    0    68M    35M CPU4     4   0:39  87.86% z
 1693 kargl         1  56    0    68M    35M CPU7     7   0:45  78.12% z

With 4BSD, you see the ./z's with 80% or greater CPU.  All the ./z's exit
after 55-ish seconds.  If you try this experiment on ULE, you'll get NCPU-1
./z's with nearly 99% CPU and 2 ./z's with something like 45-ish% as the
two images ping-pong on one cpu.  Back when I was testing ULE vs 4BSD,
this was/is due to ULE's cpu affinity where processes never migrate to
another cpu.  Admittedly, this was several years ago.  Maybe ULE has
gotten better, but George's rant seems to suggest otherwise.  

-- 
Steve