kevent has bug?
Konstantin Belousov
kostikbel at gmail.com
Wed Apr 2 12:07:58 UTC 2014
On Wed, Apr 02, 2014 at 04:06:16PM +0900, Kohji Okuno wrote:
> From: John-Mark Gurney <jmg at funkthat.com>
> Date: Tue, 1 Apr 2014 23:15:51 -0700
> > Kohji Okuno wrote this message on Wed, Apr 02, 2014 at 11:45 +0900:
> >> I think, kevent() has a bug.
> >> I tested sample programs by attached sources.
> >> This sample tests about EVFILT_SIGNAL.
> >>
> >> I build sample programs by the following commands.
> >> % gcc -O2 -o child child.c
> >> % gcc -O2 -o parent parent.c
> >>
> >> The expected result is the following.
> >> % ./parent
> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> >> OK
> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> >> OK
> >>
> >> But, sometimes the result was the following.
> >> % ./parent
> >> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
> >>
> >> This result means the number of times the signal has occured was
> >> incorrect.
> >
> > I was able to reproduce this...
> >
> >> In case of EVFILT_SIGNAL, according to `man kevent', `data' retuns the
> >> number of times the signal has occurred since the last call to
> >> kevent(). This `data' is recorded by filt_signal() (This is f_event in
> >> struct filterops).
> >>
> >> The system call kevent()'s events are processed by kqueue_scan() in
> >> kern_event.c. In kqueue_scan(), kn->kn_fop->f_event() is allways
> >> called after KN_INFLUX is set to kn->kn_status.
> >>
> >> On the other hand, kernel events are occured by knote() in
> >> kern_event.c. (In EVFILT_SIGNAL, knote() is called from tdsendsignal()
> >> in kern_sig.c.) In knote(), kn->kn_fop->f_event() is called only when
> >> KN_INFLUX is not set in kn->kn_status.
> >>
> >> In race condition between kqueue_scan() and knote(),
> >> kn->kn_fop->f_event() from knote() may not be called, I think.
> >
> > Considering that both are called w/ a lock, that cannot happen..
> > KN_LIST_LOCK(kn) locks the same lock that is asserted that is held
> > by knote...
> >
> >> In knote(), because the context holds knlist's lock, the context can
> >> not sleep. So, KN_INFLUX should not be set on calling
> >> kn->kn_fop->f_event() in kqueue_scan(), I think.
> >
> > No, it needs to be set:
> > * Setting the KN_INFLUX flag enables you to unlock the kq that this knote
> > * is on, and modify kn_status as if you had the KQ lock.
> >
> > As this comment says, _INFLUX allows you to unlock the KQ w/o fear that
> > the knote will disappear out from under you causing you to dereference
> > possibly free'd memory..
> >
> > If you just tried to lock the list lock w/o unlocking the KQ lock, you
> > could end up w/ a dead lock, as you aren't maintaining lock order
> > properly.. The correct lock order if knlist -> kq...
> >
> >> What do you think about this issue?
> >
> > This is a real issue, but not due to the race you described
> > above...
>
> I beleave it's the result of the race.
>
> Could you try to add printf() in knote()?
> Please refer to attached patch.
>
>
> > I have verified on my machine that it isn't because there is a knote
> > waiting that isn't getting woken up, and the knote on my hung process
> > has data == 0, so it definately lost one of the signals:
> > (kgdb) print $14.kq_knhash[20].slh_first[0]
> > $20 = {kn_link = {sle_next = 0x0}, kn_selnext = {sle_next = 0x0},
> > kn_knlist = 0xfffff8005a9c5840, kn_tqe = {tqe_next = 0xfffff801fdab4500,
> > tqe_prev = 0xfffff8004bb10038}, kn_kq = 0xfffff8004bb10000, kn_kevent = {
> > ident = 20, filter = -6, flags = 32, fflags = 0, data = 0, udata = 0x0},
> > kn_status = 0, kn_sfflags = 0, kn_sdata = 0, kn_ptr = {
> > p_fp = 0xfffff8005a9c54b8, p_proc = 0xfffff8005a9c54b8,
> > p_aio = 0xfffff8005a9c54b8, p_lio = 0xfffff8005a9c54b8,
> > p_v = 0xfffff8005a9c54b8}, kn_fop = 0xffffffff81405ef0, kn_hook = 0x0,
> > kn_hookid = 0}
> >
> > If you want to find this yourself, you can run kgdb on a live system,
> > switch to the thread of the parent (info threads, thread XXX), and
> > do:
> > frame 7
> >
> > (or a frame that has td, which is struct thread *), then:
> > print *(struct kqueue *)td->td_proc[0].p_fd[0].fd_ofiles[3].fde_file[0].f_data
> >
> > This will give you the struct kqueue * of the parent, and then:
> > print $XX.kq_knhash[0]@63
> >
> > to figure out where the knote is in the hash, and then you can print
> > it out yourself...
> >
> > I'm going to take a look at this a bit more later... I'm thinking of
> > using dtrace to collect the stacks where filt_signal is called, and
> > match them up... dtrace might even be able to get us the note's data
> > upon return helping to make sure things got tracked properly...
> >
> > Thanks for finding this bug! Hopefully we can find a solution to it..
> >
> > --
> > John-Mark Gurney Voice: +1 415 225 5579
> >
> > "All that I will do, has been done, All that I have, has not."
> diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
> index b3fb23d..7791447 100644
> --- a/sys/kern/kern_event.c
> +++ b/sys/kern/kern_event.c
> @@ -1868,6 +1868,8 @@ knote(struct knlist *list, long hint, int lockflags)
> if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) {
> KQ_LOCK(kq);
> if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
> + if (hint & NOTE_SIGNAL)
> + printf("Aee2\n");
> KQ_UNLOCK(kq);
> } else if ((lockflags & KNF_NOKQLOCK) != 0) {
> kn->kn_status |= KN_INFLUX;
> @@ -1886,6 +1888,10 @@ knote(struct knlist *list, long hint, int lockflags)
> KQ_UNLOCK(kq);
> }
> }
> + else {
> + if (hint & NOTE_SIGNAL)
> + printf("Aee1\n");
> + }
> kq = NULL;
> }
> if ((lockflags & KNF_LISTLOCKED) == 0)
I think you are right, there is a race between setting influx before the
knote lock is owned in the kqueue_scan(), and knote() itself. The issue
is that the influx from scan must not prevent knote() from waking up
the waiters.
Patch below fixed your test case for me, also tools/regression/kqueue did
not noticed a breakage. I tried to describe the situation in the
comment in knote(). Also, I removed unlocked check for the KN_INFLUX
in knote, since it seems to be an optimization for rare case, and is
the race on its own.
diff --git a/sys/kern/kern_event.c b/sys/kern/kern_event.c
index b3fb23d..380f1ff 100644
--- a/sys/kern/kern_event.c
+++ b/sys/kern/kern_event.c
@@ -474,7 +474,7 @@ knote_fork(struct knlist *list, int pid)
continue;
kq = kn->kn_kq;
KQ_LOCK(kq);
- if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
+ if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
KQ_UNLOCK(kq);
continue;
}
@@ -1506,7 +1506,7 @@ retry:
KQ_LOCK(kq);
kn = NULL;
} else {
- kn->kn_status |= KN_INFLUX;
+ kn->kn_status |= KN_INFLUX | KN_SCAN;
KQ_UNLOCK(kq);
if ((kn->kn_status & KN_KQUEUE) == KN_KQUEUE)
KQ_GLOBAL_LOCK(&kq_global, haskqglobal);
@@ -1515,7 +1515,8 @@ retry:
KQ_LOCK(kq);
KQ_GLOBAL_UNLOCK(&kq_global, haskqglobal);
kn->kn_status &=
- ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX);
+ ~(KN_QUEUED | KN_ACTIVE | KN_INFLUX |
+ KN_SCAN);
kq->kq_count--;
KN_LIST_UNLOCK(kn);
influx = 1;
@@ -1545,7 +1546,7 @@ retry:
} else
TAILQ_INSERT_TAIL(&kq->kq_head, kn, kn_tqe);
- kn->kn_status &= ~(KN_INFLUX);
+ kn->kn_status &= ~(KN_INFLUX | KN_SCAN);
KN_LIST_UNLOCK(kn);
influx = 1;
}
@@ -1865,28 +1866,33 @@ knote(struct knlist *list, long hint, int lockflags)
*/
SLIST_FOREACH(kn, &list->kl_list, kn_selnext) {
kq = kn->kn_kq;
- if ((kn->kn_status & KN_INFLUX) != KN_INFLUX) {
+ KQ_LOCK(kq);
+ if ((kn->kn_status & (KN_INFLUX | KN_SCAN)) == KN_INFLUX) {
+ /*
+ * Do not process the influx notes, except for
+ * the influx coming from the kq unlock in the
+ * kqueue_scan(). In the later case, we do
+ * not interfere with the scan, since the code
+ * fragment in kqueue_scan() locks the knlist,
+ * and cannot proceed until we finished.
+ */
+ KQ_UNLOCK(kq);
+ } else if ((lockflags & KNF_NOKQLOCK) != 0) {
+ kn->kn_status |= KN_INFLUX;
+ KQ_UNLOCK(kq);
+ error = kn->kn_fop->f_event(kn, hint);
KQ_LOCK(kq);
- if ((kn->kn_status & KN_INFLUX) == KN_INFLUX) {
- KQ_UNLOCK(kq);
- } else if ((lockflags & KNF_NOKQLOCK) != 0) {
- kn->kn_status |= KN_INFLUX;
- KQ_UNLOCK(kq);
- error = kn->kn_fop->f_event(kn, hint);
- KQ_LOCK(kq);
- kn->kn_status &= ~KN_INFLUX;
- if (error)
- KNOTE_ACTIVATE(kn, 1);
- KQ_UNLOCK_FLUX(kq);
- } else {
- kn->kn_status |= KN_HASKQLOCK;
- if (kn->kn_fop->f_event(kn, hint))
- KNOTE_ACTIVATE(kn, 1);
- kn->kn_status &= ~KN_HASKQLOCK;
- KQ_UNLOCK(kq);
- }
+ kn->kn_status &= ~KN_INFLUX;
+ if (error)
+ KNOTE_ACTIVATE(kn, 1);
+ KQ_UNLOCK_FLUX(kq);
+ } else {
+ kn->kn_status |= KN_HASKQLOCK;
+ if (kn->kn_fop->f_event(kn, hint))
+ KNOTE_ACTIVATE(kn, 1);
+ kn->kn_status &= ~KN_HASKQLOCK;
+ KQ_UNLOCK(kq);
}
- kq = NULL;
}
if ((lockflags & KNF_LISTLOCKED) == 0)
list->kl_unlock(list->kl_lockarg);
diff --git a/sys/sys/event.h b/sys/sys/event.h
index bad8c9e..3b765c0 100644
--- a/sys/sys/event.h
+++ b/sys/sys/event.h
@@ -207,6 +207,7 @@ struct knote {
#define KN_MARKER 0x20 /* ignore this knote */
#define KN_KQUEUE 0x40 /* this knote belongs to a kq */
#define KN_HASKQLOCK 0x80 /* for _inevent */
+#define KN_SCAN 0x100 /* flux set in kqueue_scan() */
int kn_sfflags; /* saved filter flags */
intptr_t kn_sdata; /* saved data field */
union {
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 834 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20140402/fec217ed/attachment.sig>
More information about the freebsd-current
mailing list