Scheduler weirdness
Steve Kargl
sgk at troutmask.apl.washington.edu
Mon Oct 12 04:49:12 UTC 2009
On Mon, Oct 12, 2009 at 03:35:15PM +1100, Alex R wrote:
> Steve Kargl wrote:
> >On Mon, Oct 12, 2009 at 01:49:27PM +1100, Alex R wrote:
> >
> >>Steve Kargl wrote:
> >>
> >>>So, you have 4 cpus and 4 folding-at-home processes and you're
> >>>trying to use the system with other apps? Switch to 4BSD.
> >>>
> >>>
> >>>
> >>I thought SCHED_ULE was meant to be a much better choice under an SMP
> >>environment. Why are you suggesting he rebuild his kernel and use the
> >>legacy scheduler?
> >>
> >>
> >
> >If you have N cpus and N+1 numerical intensitive applications,
> >ULE may have poor performance compared to 4BSD. In OP's case,
> >he has 4 cpus and 4 numerical intensity (?) applications. He,
> >however, also is trying to use the system in some interactive
> >way.
> >
> >
> Ah ok. Is this just an accepted thing by the freebsd dev's or are they
> trying to fix it?
>
Jeff appears to be extremely busy with other projects. He is aware of
the problem, and I have set up my system to give him access when/if it
is so desired.
Here's the text of my last set of tests that I sent to him
OK, I've manage to recreate the problem. User kargl launches a mpi
job on node10 that creates two images on node20. This is command z
in the top(1) info. 30 seconds later, user sgk lauches a mpi process
on node10 that creates 8 images on node20. This is command rivmp in
top(1) info. With 8 available cpus, this is a (slightly) oversubscribed
node.
For 4BSD, I see
last pid: 1432; load averages: 8.68, 5.65, 2.82 up 0+01:52:14 17:07:22
40 processes: 11 running, 29 sleeping
CPU: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
Mem: 32M Active, 12M Inact, 203M Wired, 424K Cache, 29M Buf, 31G Free
Swap: 4096M Total, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1428 sgk 1 124 0 81788K 5848K CPU3 6 1:13 78.81% rivmp
1431 sgk 1 124 0 81788K 5652K RUN 1 1:13 78.52% rivmp
1415 kargl 1 124 0 78780K 4668K CPU7 1 1:38 78.42% z
1414 kargl 1 124 0 78780K 4664K CPU0 0 1:37 77.25% z
1427 sgk 1 124 0 81788K 5852K CPU4 3 1:13 78.42% rivmp
1432 sgk 1 124 0 81788K 5652K CPU2 4 1:13 78.27% rivmp
1425 sgk 1 124 0 81788K 6004K CPU5 5 1:12 78.17% rivmp
1426 sgk 1 124 0 81788K 5832K RUN 6 1:13 78.03% rivmp
1429 sgk 1 124 0 81788K 5788K CPU6 7 1:12 77.98% rivmp
1430 sgk 1 124 0 81788K 5764K RUN 2 1:13 77.93% rivmp
Notice, the accumulated times appear reasonable. At this point in the
computations, rivmp is doing no communication between processes. z is
the netpipe benchmark and is essentially sending messages between the
two processes over the memory bus.
For ULE, I see
last pid: 1169; load averages: 7.56, 2.61, 1.02 up 0+00:03:15 17:13:01
40 processes: 11 running, 29 sleeping
CPU: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
Mem: 31M Active, 9392K Inact, 197M Wired, 248K Cache, 26M Buf, 31G Free
Swap: 4096M Total, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
1168 sgk 1 118 0 81788K 5472K CPU6 6 1:18 100.00% rivmp
1169 sgk 1 118 0 81788K 5416K CPU7 7 1:18 100.00% rivmp
1167 sgk 1 118 0 81788K 5496K CPU5 5 1:18 100.00% rivmp
1166 sgk 1 118 0 81788K 5564K RUN 4 1:18 100.00% rivmp
1151 kargl 1 118 0 78780K 4464K CPU3 3 1:48 99.27% z
1152 kargl 1 110 0 78780K 4464K CPU0 0 1:18 62.89% z
1164 sgk 1 113 0 81788K 5592K CPU1 1 0:55 80.76% rivmp
1165 sgk 1 110 0 81788K 5544K RUN 0 0:52 62.16% rivmp
1163 sgk 1 107 0 81788K 5624K RUN 2 0:40 50.68% rivmp
1162 sgk 1 107 0 81788K 5824K CPU2 2 0:39 50.49% rivmp
In the above, processes 1162-1165 are clearly not receiving sufficient time
slices to keep up with the other 4 rivmp images. From watching top at a
1 second interval, once the 4 rivmp hit 100% CPU, they stayed pinned to
their cpu and stay at 100% CPU. It is also seen that processes 1152, 1165
and 1162, 1163 are stuck on cpus 0 and 2, respectively.
--
Steve
More information about the freebsd-current
mailing list