svn commit: r222537 - in head/sys: kern sys
Kenneth D. Merry
ken at freebsd.org
Fri Jun 3 21:11:08 UTC 2011
On Fri, Jun 03, 2011 at 20:47:00 +1000, Bruce Evans wrote:
> On Thu, 2 Jun 2011, Kenneth D. Merry wrote:
>
> >On Wed, Jun 01, 2011 at 20:07:31 +1000, Bruce Evans wrote:
> >>On Wed, 1 Jun 2011, Andriy Gapon wrote:
> >>
> >>>on 31/05/2011 20:29 Kenneth D. Merry said the following:
> >>>>+ mtx_init(&mbp->msg_lock, "msgbuf", NULL, MTX_SPIN);
> >>>
> >>>Sorry that I didn't gather myself together for a review before this
> >>>change
> >>>got
> >>>actually committed.
> >>>Do you see any reason not to make this spinlock recursive?
>
> It does need to be recursive to avoid deadlock (as I described), but this
> prevents it actually working as a lock. I think the new code needs it to
> work as a lock.
Yes, it does.
> >>I see reasons why it must not exist. The message buffer code cannot use
> >>any normal locking, and was carefully written to avoid doing so.
> >>
> >>>I am a little bit worried about "exotic" situations like receiving an NMI
> >>>in the
> >>>middle of printing and wanting to print in the NMI context, or similar
> >>>things
> >>>that penetrate contexts with disabled interrupts - e.g. Machine Check
> >>>Exception.
> >>>Also it's not clear to me if there won't any bigger damage in the
> >>>situations
> >>>like those described above.
> >>>
> >>>P.S. I have been thinking about fixing the problem in a different
> >>>fashion,
> >>>via
> >>>reserving portions of dmesg buffer for a whole message using CAS:
> >>>http://lists.freebsd.org/pipermail/freebsd-hackers/2010-April/031535.html
> >>
> >>Something like this might work.
> >>
> >>PRINTF_BUFR size should not exist either, especially now that it is
> >>much more complicated and broken. Here is some of my old mail about
> >>adding (necessarily not normal) locking to to printf. No one replied,
> >>so I never finished this :-(.
> >
> >That particular solution doesn't lock the normal kernel printf path, and so
> >won't fix what we're trying to fix. (We've got lots of disks in each
> >system, and when there are various SES events and disks arriving and
> >departing, there are multiple kernel contexts doing printfs simultaneously,
> >and that results in scrambled dmesg output.)
>
> But it does. PRINTF_BUFR_SIZE (at least used to) only be applied to
> vprintf(), but my printf_lock() is applied to all calls to kvprintf()
> that set TOLOG. (If I missed some, this takes 2 lines each to fix.)
> The calls in tprintf() and log() have comments saying that this is
> intentional "to serialize msgbuf", though I fear this might not be safe
> for at least tprintf(), since it may block unboundedly. But even when
> it blocks unboundedly, nothing really bad happens: the output just
> becomes very slow due to everything waiting for the timeout until the
> blockage clears, and may be interleaved. Recursion gives similar
> behaviour, except it takes unrecursing to clear the blockage. I
> consider the very slow output to be a feature. Unbounded blockages
> and recursion are bugs and the slow output makes them more obvious.
Ahh. It wasn't 100% clear from looking at the patch which calls had the
lock applied. In this case at least, it's skipping the lock when the
priority is -1. Is this tprintf()?
% /*
% * Warn that a system table is full.
% @@ -198,5 +221,9 @@
% pca.flags = flags;
% va_start(ap, fmt);
% + if (pri != -1)
% + printf_lock(); /* To serialize msgbuf. XXX: long-held
lock? */
% kvprintf(fmt, putchar, &pca, 10, ap);
% + if (pri != -1)
% + printf_unlock();
% va_end(ap);
% if (sess != NULL)
%
> >I think the goals should be:
> >
> >- console output, syslog output, and dmesg output are not scrambled.
> > (i.e. individual kernel printfs make it out atomically, or at least
> > with a certain granularity, like PRINTF_BUFR_SIZE.)
>
> Console output includes debugger output. This must be able to break into
> any otherwise-atomic kernel printf. Possibly similarly for NMIs (if they
> print. They really shouldn't print, but they at least used to get this
> horribly wrong from the beginning on i386). Similarly for certain (non
> maskable) traps. These really shouldn't happen while we're printing, but
> if they do then they may need to panic and then print for things like
> machine check exceptions.
>
> Debugger output is the easiest way of showing why printf cannot use
> any normal locking. There is an option (that I don't like or use)
> which supports debugger output going through normal printf. I think
> the message buffer is used for this. So message buffer accesses cannout
> use normal locking either.
>
> Apart from this, I don't know any reason why a printf or even a msgbuf
> write can't wait for a while (a very long while if the output is to a
> slow device -- don't forget to test with a 50 bps serial console) until
> any previous printfs complete. My simple serialization gives this if
> the other printfs are on other CPUs, and possibly if they are on other
> ithreads (the simple serialization doesn't give up control for simplicity,
> since it is hard to tell when it is safe to give up control, but
> preemption while we are spinning might get back to the other printfs).
> And when the printf is recursive, there is no way back to the other
> printfs, so we must proceed. We could proceed immediately, but my
> simple serialization waits in this case too for simplicity. This case
> should be even rarer than waiting too long for another CPU.
Hopefully the delay won't be too bad for most consumers. It'll take some
testing to see what happens.
> >- Can be called by any kernel routine (i.e. doesn't sleep)
>
> Some complications for tprintf and uprintf (and maybe ordinary printf if
> it is connected to a pty). I think we already have the necessary
> complications. Something involving printing only to the message buffer
> in the non-sleeping part.
>
> >- Offers a string at a time interface.
>
> I don't provide this. So printf's a character or word or 2 at a time
> may be interleaved. It is perhaps not to much to ask that important
> printfs that are likely to be interleaved preprint their strings into
> lines.
The lack of a string interface isn't that big of a deal, since your
solution already serializes printf calls.
> >- Does the right thing for log messages (priority, etc.)
> >
> >It looks like we should add "does not use normal locking" to the list as
> >well.
> >
> >As Justin mentioned, we started off down the path of using atomic
> >operations, and then figured out that wouldn't fully work. Then we
> >discussed doing a per-CPU message buffer, with each message tagged with a
> >sequence number, and having the reader re-serialize the messages in the
> >right order. The complexity started to get large enough that we decided
> >that using a spin lock would be a much easier approach.
>
> But almost eveyone who has worked on this before knows that a spinlock
> cannot be used.
We haven't worked on this before, of course. :)
> >cnputs() already uses a spinlock, so we're no worse off than before from a
> >locking standpoint.
>
> An obvious bug for PRINTF_BUFR_SIZE. But as I said, this one might
> work well enough if it were recursive. This depends on the lower-level
> console drivers supporting recursion. Most don't. Some have simple
> serialization, but have to bypass it open to support debugger output,
> much like higher levels except they have more control over the timing.
Ahh.
> But in the message buffer accesses, I think you used the lock to provide
> exlusive accesses.
Yes. The lock is used to prevent the buffer getting scrambled. Recursion
probably wouldn't work very well.
> >We could perhaps make the message buffer mutex recursive and set the
> >MTX_NOWITNESS flag as well, that might make things a little better from a
> >side effect standpoint.
>
> I don't see how a recursive mutex can protect the message buffer. Don't
> forget to trace through the code protected by the mutex using ddb, while
> at the same time using this code for ddb itself. It's just too hard to
> make this work. Normally, this type of problem is avoided by avoiding
> not using code like this for the low levels like ddb itself. Ddb mostly
> does this, but has complications:
>
> % static void
> % db_putc(int c)
> % {
> %
> % /*
> % * If not in the debugger or the user requests it, output data to
> % * both the console and the message buffer.
> % */
> % if (!kdb_active || ddb_use_printf) {
> % printf("%c", c);
>
> Here it uses ordinary printf, so it uses whatever printf uses. In
> particular, PRINTF_BUFR_SIZE makes printf use cnputs() and the buggy
> locking in it, and printf always uses the message buffer and I think
> the new code gives buggy locking there unconditionally.
>
> This problem is not very large, since it takes some foot shooting to get
> here:
> - there is some buggy code that uses db_printf() when !kdb_active.
> Perhaps this is not used recursively.
> - we can configure ddb to use printf. I think it is a feature that the
> message buffer gets used in this mode.
>
> % if (!kdb_active)
> % return;
> % if (c == '\r' || c == '\n')
> % db_check_interrupt();
> % if (c == '\n' && db_maxlines > 0) {
> % db_newlines++;
> % if (db_newlines >= db_maxlines)
> % db_pager();
> % }
> % return;
> % }
> %
> % /* Otherwise, output data directly to the console. */
> % if (c > ' ' && c <= '~') {
> % /*
> % * Printing character.
> % * If we have spaces to print, print them first.
> % * Use tabs if possible.
> % */
> % db_force_whitespace();
> % cnputc(c);
> % db_capture_writech(c);
> % db_output_position++;
> % db_last_non_space = db_output_position;
> % }
>
> I think this routine is used mainly by db_printf() calling kvprintf() and
> kvprintf() calling here. kvprintf() is supposed to be reentrant and in
> particular not use any locking to support this. I think it still does.
>
> Bruce
I'll try to get some time to run some tests with your locking code and see
how it works.
We just need something that works, whether it's a spinlock or hand-rolled
lock doesn't matter so much.
Ken
--
Kenneth Merry
ken at FreeBSD.ORG
More information about the svn-src-head
mailing list