Managing userland data pointers in kqueue/kevent

Jilles Tjoelker jilles at stack.nl
Sun May 19 16:17:39 UTC 2013


On Wed, May 15, 2013 at 01:34:58PM +0100, Paul LeoNerd Evans wrote:
> On Wed, 15 May 2013 13:29:59 +0100
> Paul "LeoNerd" Evans <leonerd at leonerd.org.uk> wrote:

> > Is that not the exact thing I suggested?

> > The "extension to create register a kevent to catch these events" is
> > that you put the EV_DROPWATCH bit flag in the event at the time you
> > register it.

> > The "returned event [that] could have all the appropriate informaiton
> > for the event being dropped" is that you receive an event with
> > EV_DROPPED set on it. It being a real event includes of course the
> > udata pointer, so you can handle it.

> In fact, to requote the original PR I wrote[1] on the subject:

> ---

>   I propose the addition of a new flag applicable to any kevent watch
>   structure, documented thusly:

>     The flags field can contain the following values:
>     ..
>     EV_DROPWATCH Requests that the kernel will send an EV_DROPPED event
>                  on this watch when it has finished watching it for any
>                  reason, including EV_DELETE, expiry because of
>                  EV_ONESHOT, or because the filehandle was closed by
>                  close(2).
> 
>     EV_DROPPED   This flag is returned by the kernel if it is now about
>                  to drop the watch. After this flag has been received,
>                  no further events will occur on this watch.

>   This flag then makes it trivial to build a generic wrapper for kqueue
>   that can always manage its memory correctly.

>   a) at EV_ADD time, simply set flags |= EV_DROPWATCH

>   b) after an event has been processed that included the EV_DROPPED
>   flag, free() the pointer given in the udata field.

An important detail is missing: how do you avoid using up all kernel
memory on knotes if someone keeps adding new file descriptors with
EV_ADD | EV_DROPWATCH and closing the file descriptors again without
ever draining the kqueue?

This problem did not use to exist for file descriptor events before: the
number of such knotes was limited to the number of open file
descriptors. However, it does already exist for most of the other event
types. For example, pwait -v will return the exit status even if it was
suspended (^Z) while the process terminated and the parent reaped the
zombie. For EVFILT_TIMER, the worst effect is a denial of service of
EVFILT_TIMER on all other processes in the system. EVFILT_USER does not
appear to check anything and appears to allow arbitrary kernel memory
consumption.

The EVFILT_TIMER needs to keep its global limit and EVFILT_USER needs
something similar.

For the rest, call an event that is no longer associated to a kernel
object (e.g. EVFILT_READ whose file descriptor is closed, EVFILT_PROC
whose process has terminated and been reaped by the parent or EVFILT_AIO
whose I/O request is completed) "unbound". The number of events that are
not unbound is limited by existing limits on the other kernel objects. A
possible fix is to reject (such as with [ENOMEM]) adding new events when
there are too many unbound events in the queue already. The application
should then allow kevent() to return pending events first before it adds
new ones. If the kernel returns unbound events in preference to other
events, a kevent() call with nevents >= 2 * nchanges cannot result in a
net increase in the number of current and potential unbound events,
since it allows the kernel to return (and forget) as many unbound events
as it may add (nchanges entries are required for EV_ERROR leaving
nchanges for returning other events).

>   It is not required that these two flags have distinct values; since
>   one is userland->kernel and the other kernel->userland, they could for
>   neatness reuse the same bit field.

I think it would be consistent with other EV_* to use the same name and
value for both.

-- 
Jilles Tjoelker


More information about the freebsd-hackers mailing list