SCHED_ULE should not be the default
mdf at FreeBSD.org
mdf at FreeBSD.org
Wed Dec 14 00:01:57 UTC 2011
On Tue, Dec 13, 2011 at 3:39 PM, Ivan Klymenko <fidaj at ukr.net> wrote:
> В Wed, 14 Dec 2011 00:04:42 +0100
> Jilles Tjoelker <jilles at stack.nl> пишет:
>
>> On Tue, Dec 13, 2011 at 10:40:48AM +0200, Ivan Klymenko wrote:
>> > If the algorithm ULE does not contain problems - it means the
>> > problem has Core2Duo, or in a piece of code that uses the ULE
>> > scheduler. I already wrote in a mailing list that specifically in
>> > my case (Core2Duo) partially helps the following patch:
>> > --- sched_ule.c.orig 2011-11-24 18:11:48.000000000 +0200
>> > +++ sched_ule.c 2011-12-10 22:47:08.000000000 +0200
>> > @@ -794,7 +794,8 @@
>> > * 1.5 * balance_interval.
>> > */
>> > balance_ticks = max(balance_interval / 2, 1);
>> > - balance_ticks += random() % balance_interval;
>> > +// balance_ticks += random() % balance_interval;
>> > + balance_ticks += ((int)random()) % balance_interval;
>> > if (smp_started == 0 || rebalance == 0)
>> > return;
>> > tdq = TDQ_SELF();
>>
>> This avoids a 64-bit division on 64-bit platforms but seems to have no
>> effect otherwise. Because this function is not called very often, the
>> change seems unlikely to help.
>
> Yes, this section does not apply to this problem :)
> Just I posted the latest patch which i using now...
>
>>
>> > @@ -2118,13 +2119,21 @@
>> > struct td_sched *ts;
>> >
>> > THREAD_LOCK_ASSERT(td, MA_OWNED);
>> > + if (td->td_pri_class & PRI_FIFO_BIT)
>> > + return;
>> > + ts = td->td_sched;
>> > + /*
>> > + * We used up one time slice.
>> > + */
>> > + if (--ts->ts_slice > 0)
>> > + return;
>>
>> This skips most of the periodic functionality (long term load
>> balancer, saving switch count (?), insert index (?), interactivity
>> score update for long running thread) if the thread is not going to
>> be rescheduled right now.
>>
>> It looks wrong but it is a data point if it helps your workload.
>
> Yes, I did it for as long as possible to delay the execution of the code in section:
> ...
> #ifdef SMP
> /*
> * We run the long term load balancer infrequently on the first cpu.
> */
> if (balance_tdq == tdq) {
> if (balance_ticks && --balance_ticks == 0)
> sched_balance();
> }
> #endif
> ...
>
>>
>> > tdq = TDQ_SELF();
>> > #ifdef SMP
>> > /*
>> > * We run the long term load balancer infrequently on the
>> > first cpu. */
>> > - if (balance_tdq == tdq) {
>> > - if (balance_ticks && --balance_ticks == 0)
>> > + if (balance_ticks && --balance_ticks == 0) {
>> > + if (balance_tdq == tdq)
>> > sched_balance();
>> > }
>> > #endif
>>
>> The main effect of this appears to be to disable the long term load
>> balancer completely after some time. At some point, a CPU other than
>> the first CPU (which uses balance_tdq) will set balance_ticks = 0, and
>> sched_balance() will never be called again.
>>
>
> That is, for the same reason as above in the text...
>
>> It also introduces a hypothetical race condition because the access to
>> balance_ticks is no longer restricted to one CPU under a spinlock.
>>
>> If the long term load balancer may be causing trouble, try setting
>> kern.sched.balance_interval to a higher value with unpatched code.
>
> I checked it in the first place - but it did not help fix the situation...
>
> The impression of malfunction rebalancing...
> It seems that the thread is passed on to the same core that is loaded and so...
> Perhaps this is a consequence of an incorrect definition of the topology CPU?
>
>>
>> > @@ -2144,9 +2153,6 @@
>> > if
>> > (TAILQ_EMPTY(&tdq->tdq_timeshare.rq_queues[tdq->tdq_ridx]))
>> > tdq->tdq_ridx = tdq->tdq_idx; }
>> > - ts = td->td_sched;
>> > - if (td->td_pri_class & PRI_FIFO_BIT)
>> > - return;
>> > if (PRI_BASE(td->td_pri_class) == PRI_TIMESHARE) {
>> > /*
>> > * We used a tick; charge it to the thread so
>> > @@ -2157,11 +2163,6 @@
>> > sched_priority(td);
>> > }
>> > /*
>> > - * We used up one time slice.
>> > - */
>> > - if (--ts->ts_slice > 0)
>> > - return;
>> > - /*
>> > * We're out of time, force a requeue at userret().
>> > */
>> > ts->ts_slice = sched_slice;
>>
>> > and refusal to use options FULL_PREEMPTION
>> > But no one has unsubscribed to my letter, my patch helps or not in
>> > the case of Core2Duo...
>> > There is a suspicion that the problems stem from the sections of
>> > code associated with the SMP...
>> > Maybe I'm in something wrong, but I want to help in solving this
>> > problem ...
Has anyone experiencing problems tried to set sysctl kern.sched.steal_thresh=1 ?
I don't remember what our specific problem at $WORK was, perhaps it
was just interrupt threads not getting serviced fast enough, but we've
hard-coded this to 1 and removed the code that sets it in
sched_initticks(). The same effect should be had by setting the
sysctl after a box is up.
Thanks,
matthew
More information about the freebsd-current
mailing list