draining high-frequency callouts
Peter Holm
peter at holm.cc
Tue Mar 14 17:02:25 UTC 2017
On Mon, Mar 13, 2017 at 11:38:13AM -0700, Mark Johnston wrote:
> On Mon, Mar 13, 2017 at 09:21:20AM +0100, Peter Holm wrote:
> > On Tue, Jan 10, 2017 at 12:57:12PM -0800, Mark Johnston wrote:
> > > I'm occasionally seeing an assertion failure in softclock_call_cc() when
> > > running DTrace tests on a system with hz=10000. The assertion
> > > (c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is
> > > concurrently draining the callout, which runs at a high frequency. At
> > > the time of the panic, that thread is spinning on the per-CPU callout
> > > lock after having been awoken from "codrain", and CALLOUT_PENDING is
> > > set on the callout. The callout is direct, i.e., it is executed in hard
> > > interrupt context.
> > >
> > > I think this is what's happening:
> > > - callout_drain() is called while the callout is executing but after the
> > > callout has rescheduled itself, and goes to sleep after having cleared
> > > CALLOUT_ACTIVE.
> > > - softclock_call_cc() wakes up the callout_drain() caller, but the
> > > callout fires again before the caller is scheduled.
> > > - the second softclock_call_cc() call sees that CALLOUT_ACTIVE is
> > > cleared and panics.
> > >
> > > Is there anything that prevents this scenario? Is it really correct to
> > > leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be
> > > dropped in order to acquire a sleepqueue lock?
> > >
> >
> > Is this the same problem?
> >
> > panic: softclock_call_cc: act 0xfffff8000de64800 0
>
> It's hard to say for sure. The minimal patch below fixed the problem for
> me - could you give it a try? I also did not see any problems while
> testing on Hans' branch.
>
> diff --git a/sys/kern/kern_timeout.c b/sys/kern/kern_timeout.c
> index 5b70cf2033f5..a9c50fd98fbe 100644
> --- a/sys/kern/kern_timeout.c
> +++ b/sys/kern/kern_timeout.c
> @@ -1256,7 +1256,8 @@ again:
> * Succeed we to stop it or not, we must clear the
> * active flag - this is what API users expect.
> */
> - c->c_flags &= ~CALLOUT_ACTIVE;
> + if ((flags & CS_DRAIN) == 0)
> + c->c_flags &= ~CALLOUT_ACTIVE;
>
> if ((flags & CS_DRAIN) != 0) {
> /*
> @@ -1315,6 +1316,7 @@ again:
> PICKUP_GIANT();
> CC_LOCK(cc);
> }
> + c->c_flags &= ~CALLOUT_ACTIVE;
> } else if (use_lock &&
> !cc_exec_cancel(cc, direct) && (drain == NULL)) {
>
I ran the test that triggered the panic all night.
I follow up with a buildworld + a random mix of tests for a total
of 24 hours.
No problems seen.
--
Peter
More information about the freebsd-hackers
mailing list