draining high-frequency callouts

Tue Jan 10 20:50:42 UTC 2017

I'm occasionally seeing an assertion failure in softclock_call_cc() when
running DTrace tests on a system with hz=10000. The assertion
(c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is
concurrently draining the callout, which runs at a high frequency. At
the time of the panic, that thread is spinning on the per-CPU callout
lock after having been awoken from "codrain", and CALLOUT_PENDING is
set on the callout. The callout is direct, i.e., it is executed in hard
interrupt context.

I think this is what's happening:
- callout_drain() is called while the callout is executing but after the
  callout has rescheduled itself, and goes to sleep after having cleared
  CALLOUT_ACTIVE.
- softclock_call_cc() wakes up the callout_drain() caller, but the
  callout fires again before the caller is scheduled.
- the second softclock_call_cc() call sees that CALLOUT_ACTIVE is
  cleared and panics.

Is there anything that prevents this scenario? Is it really correct to
leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be
dropped in order to acquire a sleepqueue lock?

Thanks,
-Mark