[PATCH 0/2] plug capability races
Bruce Evans
brde at optusnet.com.au
Sat Aug 16 02:07:58 UTC 2014
On Fri, 15 Aug 2014, John Baldwin wrote:
> One thing I would like to see is for the timecounter code to be adapted to use
> the seq API instead of doing it by hand (the timecounter code is also missing
> barriers due to doing it by hand).
Locking in the timecounter code is poor (1), but I fear a general mechanism
would be slower. Also, the timecounter code now extends into userland,
so purely kernel locking cannot work for it. The userland part is
more careful about locking than the kernel. It has memory barriers and
other pessimizations which were intentionally left out of the kernel
locking for timecounters. If these barriers are actually necessary, then
they give the silly situation that there are less races for userland
timecounting than kernel timecounting provided userland mostly does
direct accesses instead of syscalls and kernel uses of timecounters are
are infrequent enough to not race often with the userland accesses.
(1) The main bugs are:
- various calls to tc_setclock() with only Giant locking or worse. One
is in settime(). settime() used to be locked by spclock(). That
stopped working when SMPng removed spls and replaced them by Giant
locking, and is more broken now. Giant locking is inadequate for
most things related to timing. Using it in settime() gives the
problem that settime() might be interrupted for many milliseconds.
It then sets a time significantly in the past, assuming that it
started with the correct time. Of course, it might be preempted
before it aquires a sufficient lock. It never tried to recover
from that. The extra breakage is that someone removed the splx()
at the end, so settime() now has an splclock() with no corresponding
splx(). This is only harmless since splclock() has no effect.
settime()'s locking in FreeBSD-4, where it was last mostly
correct, was:
splclock() at start. Run almost entirely with that. But lower
the spl using splsoftclock() too soon, before the end. It was
apparently necessary to call lease_updatetime() at a lower
priority, but the code was badly ordered so that was done before
updating the time in the hardware. In fact, the hardware part
was done woth no locking at all: from FreeBSD-4:
% ts.tv_sec = tv->tv_sec;
% ts.tv_nsec = tv->tv_usec * 1000;
% set_timecounter(&ts);
FreeBSD-4 had timecounters, and set_timecounter() corresponds to
tc_setclock(). The locking here was provided all timecounter
updates are locked by the same lock (splclock()). The most
critical ones were -- timecounter updates are normally done
from hardclock() and that was locked by splclock().
% (void) splsoftclock();
% lease_updatetime(delta.tv_sec);
Reduce priority, but not to 0, to call lease_updatetime().
% splx(s);
% resettodr();
Reduce priority to 0 to update the hardware (TODR, not timecounter).
This is broken. The lock should be held throughout to increase the
chance of updating the hardware to a time that is nearly current.
Worse, most resettodr() implementations had and still have null
locking except possibly for individual register accesses. They
depend on locking from here and other callers. splclock() locking
would have been adequate.
% return (0);
SMPng broke all this. It still doesn't even aquire Giant until near
the end, leaving parts unlocked:
% static int
% settime(struct thread *td, struct timeval *tv)
% {
% struct timeval delta, tv1, tv2;
% static struct timeval maxtime, laststep;
% struct timespec ts;
% int s;
%
% s = splclock();
% microtime(&tv1);
No locking yet, except internally in microtime().
% delta = *tv;
% timevalsub(&delta, &tv1);
%
% /*
% * If the system is secure, we do not allow the time to be
% * set to a value earlier than 1 second less than the highest
% * ...
% */
% if (securelevel_gt(td->td_ucred, 1) != 0) {
Race with changes in the setting of securelevel. Not important to see
a stale value provided we don't act nonatomically on it.
% if (delta.tv_sec < 0 || delta.tv_usec < 0) {
% /*
% * Update maxtime to latest time we've seen.
% */
% if (tv1.tv_sec > maxtime.tv_sec)
% maxtime = tv1;
% tv2 = *tv;
% timevalsub(&tv2, &maxtime);
% if (tv2.tv_sec < -1) {
% tv->tv_sec = maxtime.tv_sec - 1;
% printf("Time adjustment clamped to -1 second\n");
% }
Race to change our state (maxtime). Giant locking for this would be
plenty.
% } else {
% if (tv1.tv_sec == laststep.tv_sec) {
% splx(s);
% return (EPERM);
The null spls were carefully maintained here when this return was added.
% }
% if (delta.tv_sec > 1) {
% tv->tv_sec = tv1.tv_sec + 1;
% printf("Time adjustment clamped to +1 second\n");
% }
% laststep = *tv;
% }
% }
Race to change our state (laststep).
%
% ts.tv_sec = tv->tv_sec;
% ts.tv_nsec = tv->tv_usec * 1000;
% mtx_lock(&Giant);
% tc_setclock(&ts);
Giant locking at last, at a point where it is almost no use. Giant
locking is neither necessary or sufficient for tc_setclock().
% resettodr();
The Giant locking extends to resettodr() where it has a chance of
doing some good iff all other calls to resettodr() do it too.
% mtx_unlock(&Giant);
% return (0);
% }
Someone removed lease_updatetime() and the 2 spls near it. Only one of
these was related to lease_updatetime(). So splclock() is bracketed
with splx() for the unusal return above, but not for the main return.
- locking in kern_ntptime.c has similar nonsense involving Giant, but
worse since there are callbacks from timecounter code into ntp code.
This file still has a comment saying that everything must use splclock():
% * Note that all routines must run at priority splclock or higher.
but it only has 1 splclock() call itself. It depends on all calls into
it the be done at splclock() and for splclock() to actually work. It
has a sprinkling of Giant locking. But when timecounter code calls into
it, it has nothing like Giant locking. Timecounter code has some time-
domain locking and perhaps still sched_lock for the usual case where it
is called from the clock interrupt (settime() bypasses this). Timecounter
code calls ntp code mainly for seconds update and pps. These functions
provide no additional locking, but access ntptime's most critical state.
Giant locking would be a gross LOR of course. The Giant locking for the
syscalls isn't a LOR, but just cannot work.
Another interesting comment:
% #ifdef PPS_SYNC
% /*
% * hardpps() - discipline CPU clock oscillator to external PPS signal
% * ...
% * Note that, on some Unix systems this routine runs at an interrupt
% * priority level higher than the timer interrupt routine hardclock().
% * Therefore, the variables used are distinct from the hardclock()
% * variables, except for the actual time and frequency variables, which
% * are determined by this routine and updated atomically.
% */
I don't see any atomic updates. Certainly no mutexes, atomic ops or
memory barriers. I think there is at best stores that are atomic on
most UP systems combined with time-domain locking. The time-domain
locking might even work, as in timecounter code generally, but is
difficult to understand. Any variable that is only updated every
second is unlikely to have a race in the update itself. Then it can
be arranged that readers work if they see stale values out of order.
But ntptime is mostly old UP code that depends on splclock(), so it
is unlikely to arrange this.
Only core timecounter code tries to arrange this. It may be too
sloppy with memory barriers for the generation count, but this is
unclear. Userland doesn't trust this, and sprinkles memory barriers.
All userland can do to get more problems is to run stupid code that
reads the timecounter in a loop. Time-domain locking obviously works
better if the times between the reads are larger, and the stupid
loop can hit problems sooner by making more calls.
Bruce
More information about the freebsd-arch
mailing list