please review, patch for lost camisr
Konstantin Belousov
kostikbel at gmail.com
Wed May 29 07:40:43 UTC 2013
On Wed, May 29, 2013 at 12:27:46AM -0700, Alfred Perlstein wrote:
> On 5/29/13 12:16 AM, Konstantin Belousov wrote:
> > On Tue, May 28, 2013 at 10:31:40PM -0700, Alfred Perlstein wrote:
> >> On 5/28/13 10:08 PM, Konstantin Belousov wrote:
> >>> On Tue, May 28, 2013 at 01:35:01PM -0700, Alfred Perlstein wrote:
> >>>> [[ moved to hackers@ from private mail. ]]
> >>>>
> >>>> On 5/28/13 1:13 PM, John Baldwin wrote:
> >>>>> On Tuesday, May 28, 2013 3:29:41 pm Alfred Perlstein wrote:
> >>>>>> On 5/28/13 9:04 AM, John Baldwin wrote:
> >>>>>>> On Tuesday, May 28, 2013 2:13:32 am Alfred Perlstein wrote:
> >>>>>>>> Hey folks,
> >>>>>>>>
> >>>>>>>> I had a talk with Nathan Whitehorn about the camisr issue. The issue we
> >>>>>>>> are seeing we mostly know, but to summarize, we are losing the camisr
> >>>>>>>> signal and the camisr is not being run.
> >>>>>>>>
> >>>>>>>> I gave him a summary of what we have been seeing and pointed him to the
> >>>>>>>> code I am concerned about here:
> >>>>>>>> http://pastebin.com/tLKr7mCV (this is inside of kern_intr.c).
> >>>>>>>>
> >>>>>>>> What I think that is happening is that the setting of it_need to 0
> >>>>>>>> inside of sys/kern/kern_intr.c:ithread_loop() is not being scheduled
> >>>>>>>> correctly and it is being delayed until AFTER the call to
> >>>>>>>> ithread_execute_handlers() right below the atomic_store_rel_int().
> >>>>>>> This seems highly unlikely, to the extent that if this were true all our
> >>>>>>> locking primitives would be broken. The store_rel is actually a release
> >>>>>>> barrier which acts like more a read/write fence. No memory accesses (read or
> >>>>>>> write) from before the release can be scheduled after the associated store,
> >>>>>>> either by the compiler or CPU. That is what Konstantin is referring to in his
> >>>>>>> commit when he says "release" semantics".
> >>>>>> Yes, that makes sense, however does it specify that the writes *must*
> >>>>>> occur at that *point*? If it only enforces ordering then we may have
> >>>>>> some issue, specifically because the setting of it to '1' inside of
> >>>>>> intr_event_schedule_thread has no barrier other than the acq semantics
> >>>>>> of the thread lock. I am wondering what is forcing out the '1' there.
> >>>>> Nothing ever forces writes. You would have to invalidate the cache to do that
> >>>>> and that is horribly expensive. It is always only about ordering and knowing
> >>>>> that if you can complete another operation on the same "cookie" variable with
> >>>>> acquire semantics that earlier writes will be visible.
> >>>> By cookie, you mean a specific memory address, basically a lock? This is
> >>>> starting to reinforce my suspicions as the setting of it_need is done
> >>>> with release semantics, however the acq on the other CPU is done on the
> >>>> thread lock. Maybe that is irrelevant. We will find out shortly.
> >>>>
> >>>>>> See below as I think we have proof that this is somehow happening.
> >>>>> Having ih_need of 1 and it_need of 0 is certainly busted. The simplest fix
> >>>>> is probably to stop using atomics on it_need and just grab the thread lock
> >>>>> in the main ithread loop and hold it while checking and clearing it_need.
> >>>>>
> >>>> OK, we have some code that will either prove this, or perturb the memory
> >>>> ordering enough to make the bug go away, or prove this assertion wrong.
> >>>>
> >>>> We will update on our findings hopefully in the next few days.
> >>> IMO the read of it_need in the 'while (ithd->it_need)' should
> >>> have acquire semantic, otherwise the future reads in the
> >>> ithread_execute_handlers(), in particular, of the ih_need, could pass
> >>> the read of it_need and cause the situation you reported. I do not
> >>> see any acquire barrier between a condition in the while() statement
> >>> and the read of ih_need in the execute_handlers().
> >>>
> >>> It is probably true that the issue you see was caused by r236456, in the
> >>> sense that implicitely locked xchgl instruction on x86 has a full barrier
> >>> semantic. As result, the store_rel() was actually an acquire too, making
> >>> this reordering impossible. I argue that this is not a bug in r236456,
> >>> but the issue in the kern_intr.c.
> >> If I remember the code correctly that would probably explain why we see
> >> it only on 9.1 system.
> >>> On the other hand, the John' suggestion to move the manipulations of
> >>> it_need under the lock is probably the best anyway.
> >>>
> >> I was wondering if it would be lower latency to maintain it_need,
> >> however to keep another variable it_needlocked under the thread lock.
> >> This would result in potential superfluous interrupts, however under
> >> load you would allow the ithread to loop without taking the thread lock
> >> some number of times.
> >>
> >> I am not really sure if this is really worth the optimization
> >> (especially since it can result in superfluous interrupts) however it
> >> may reduce latency and that might be important.
> >>
> >> Is there some people that I can pass the patch onto for help with
> >> performance once we confirm that this is the actual bug? We can do
> >> internal testing, but I am worried about regressing performance of any
> >> form of IO for the kernel.
> >>
> >> I'll show the patch soon.
> >>
> >> Thank you for the information. This is promising.
> > Well, if you and I are right, the minimal patch should be
> >
> > diff --git a/sys/kern/kern_intr.c b/sys/kern/kern_intr.c
> > index 8d63c9b..7c21015 100644
> > --- a/sys/kern/kern_intr.c
> > +++ b/sys/kern/kern_intr.c
> > @@ -1349,7 +1349,7 @@ ithread_loop(void *arg)
> > * we are running, it will set it_need to note that we
> > * should make another pass.
> > */
> > - while (ithd->it_need) {
> > + while (atomic_load_acq_int(&ithd->it_need)) {
> > /*
> > * This might need a full read and write barrier
> > * to make sure that this write posts before any
> >
>
> OK we can try this.
>
> I've been pretty good at locking when using mutexes, but when we get
> into the atomic ops like this it gets a little tough for me to follow
> without extensive research. I know that the signalling thread
> (swi_sched caller) does not use any atomic ops... is this OK?
The sequence of the actions there is
atomic_store_rel_int(&ih->ih_need, 1);
later in intr_event_schedule_thread():
it->it_need = 1;
thread_lock(td);
sched_add(td);
thread_unlock(td);
There are two things to worry about, as I see:
1. The possibiblity of seeing it_need == 1 but ih_need == 0. This could
happen on some architectures which allow the write reordering, so
it might make sense to move the store_rel from ih_need to it_need.
Effectively, we would need to have rel in both places, since
the call to intr_event_schedule_thread() is conditional.
But the reordering is impossible on x86.
2. A possibility of scheduling the interrupt thread without CPU running
it noticing the it_need = 1 write. Since the thread lock is locked
and released, the release barrier is applied, so the write should
become visible before thread is scheduled on processor.
I would start with the only addition of load_acq() for now and see if
it helps.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20130529/46e901b3/attachment.sig>
More information about the freebsd-hackers
mailing list