add closefrom() call revisited
Ighighi
ighighi at gmail.com
Wed Sep 19 02:28:58 PDT 2007
Given that NetBSD, OpenBSD and DragonFly (as well as Solaris and maybe
others) it'd be nice and worthwhile to implement it too on FreeBSD.
The attached shar archive contains 4 possible implementations of it.
One, a system call (the approach use by the other BSD's), available
here as a loadable kernel module for quick testing. The remaining 3
others are library versions. One of them doesn't currently work since
FreeBSD lacks a /proc/<pid>/fd/ that I tried to emulate with /dev/fd/,
both via devfs(5) and fdescfs(5): they seem to lacks some types of
file descriptors... Another just does what a lot of programs do: try
close() on every possible file descriptor and the other uses sysctl().
The implementation was inspired by the DragonFly code but the semantics
match Open/NetBSD's (EBADF vs EINVAL). Their code is available at:
http://www.dragonflybsd.org/cvsweb/~checkout~/src/sys/kern/kern_descrip.c
http://cvsweb.netbsd.org/bsdweb.cgi/~checkout~/src/sys/kern/kern_descrip.c
Also included in the archive is a timing test along with a regression
test borrowed from OpenSSH.
It was successfully built and tested on FreeBSD 6.2-STABLE.
There's code to make it work in -CURRENT.
A sample run on a Pentium 4 1.7Ghz:
$ make test
Trying closefrom_syscall(3) with 58976 open file descriptors
user 0.000000 sys 0.030874 total 0.030874
Trying closefrom_syscall(3) with 58976 closed file descriptors
user 0.000000 sys 0.000008 total 0.000008
Trying closefrom_sysctl(3) with 58976 open file descriptors
user 0.050941 sys 0.045333 total 0.096274
Trying closefrom_sysctl(3) with 58976 closed file descriptors
user 0.000877 sys 0.000939 total 0.001816
Trying closefrom_brute(3) with 58976 open file descriptors
user 0.037777 sys 0.043793 total 0.081570
Trying closefrom_brute(3) with 58976 closed file descriptors
user 0.026666 sys 0.046383 total 0.073049
closefrom_sysctl() has a a worst-case scenario when a lot of files
are open that may make it slower than closefrom_brute().
Implementations using /proc/<pid>/fd/ are also vulnerable to this.
With no library version guaranteed to be faster, and because of the
various reasons discussed in
http://lists.freebsd.org/pipermail/freebsd-hackers/2007-July/thread.html
I believe it'd be best to implement it as a system call (which can be
done through fcntl() anyway).
More info is included in the README.
Any ideas, suggestions?
Salutes,
Igh
-------------- next part --------------
#!/bin/sh
# This is a shell archive
echo x closefrom
mkdir -p closefrom > /dev/null 2>&1
echo x closefrom/Makefile
sed 's/^X//' > closefrom/Makefile << 'SHAR_END'
XSUBDIR = module test
X
X.include <bsd.subdir.mk>
SHAR_END
echo x closefrom/README
sed 's/^X//' > closefrom/README << 'SHAR_END'
XOVERVIEW
X
XThis tarball contains 4 possible implementations of closefrom().
XThe first, a system call, is located in ./module/syscall.c and is
Xavailable as a kernel module for quick testing.
X
XBoth NetBSD >= 3.0 and DragonFly >= 1.4 implement it as a system call.
XIn NetBSD, it uses the F_CLOSEM fcntl(), available since version 2.0.
X
XThe second, implemented with the kern.file sysctl(), is available
Xon both FreeBSD >= 5.0 and DragonFly >= 1.2. Dynamic memory should be
Xallocated for an array of "struct xfile" structures that describes each
Xopen file descriptor open file descriptor _for every running process_ in
Xthe system...! (Note: the sysctl(3) manpage should be patched to reflect
Xthe current behaviour since FreeBSD 5.0: it should mention struct xfile).
XIn my system, the size of this structure is 52 bytes, so it could fail
Xon systems that setup a larger kern.maxfiles. This function would be
Xcleaner to implement in NetBSD which has an (undocumented) kern.file2
Xthat lets you work with a specific pid instead by passing KERN_FILE_BYPID.
X
XThe third is the usual brute force approach that uses getdtablesize(),
Xused for reference on the approach most applications take.
X
XThe fourth tries to do what some implementations (including Solaris') do
Xby browsing /proc/<pid>/fd/ but using /dev/fd/. Unfortunately, it doesn't
Xwork because neither devfs(5) nor fdescfs(5) seem to include duplicated
Xfile descriptors, sockets and maybe others.
X
X-o-
X
XIt was successfully built and tested on FreeBSD 6.2-STABLE (as of
XSept, 18 2007), though code that should work on -CURRENT is present
X(namely, the new FILEDESC_S[UN]LOCK macros).
X
XTo try the implementations, run these commands as follows:
X
Xcd module
Xmake
Xsudo make load
Xcd ..
Xcd test
Xmake
Xmake check
Xmake test
X
XFor repeated testing of any of the implementations you may run:
X./closefrom syscall
X./closefrom sysctl
X./closefrom brute
X
SHAR_END
echo x closefrom/module
mkdir -p closefrom/module > /dev/null 2>&1
echo x closefrom/test
mkdir -p closefrom/test > /dev/null 2>&1
echo x closefrom/test/closefrom.c
sed 's/^X//' > closefrom/test/closefrom.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X * notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X * notice, this list of conditions and the following disclaimer in the
X * documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <dirent.h>
X#include <err.h>
X#include <errno.h>
X#include <fcntl.h>
X#include <limits.h>
X#include <stdio.h>
X#include <stdlib.h>
X#include <string.h>
X#include <unistd.h>
X#include <sys/types.h>
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/resource.h>
X#include <sys/time.h>
X#include <sys/sysctl.h>
X
X#include <sys/syscall.h>
X#include <sys/module.h>
X
X#define DEBUG
X
Xstatic void
Xusage(const char *argv0)
X{
X fprintf(stderr, "Usage: %s syscall|sysctl|brute|devfd\n"
X "Usage: %s check\n", argv0, argv0);
X exit(1);
X}
X
Xstatic int (*closefrom)(int); /* pointer to closefrom_xxx() */
X
X/*
X * LKM version of closefrom()
X */
X
Xstatic int syscall_num;
X
Xstatic void
Xfind_module(void)
X{
X struct module_stat stat;
X int modid;
X
X modid = modfind("closefrom");
X if (modid == -1)
X err(1, "modfind(closefrom)");
X
X stat.version = sizeof(stat);
X if (modstat(modid, &stat) == -1)
X err(1, "modstat()");
X
X syscall_num = stat.data.intval;
X}
X
Xstatic int
Xclosefrom_syscall(int lowfd)
X{
X return (syscall(syscall_num, lowfd));
X}
X
X/*
X * This version uses the kern.file sysctl()
X */
Xstatic int
Xclosefrom_sysctl(int lowfd)
X{
X int mib[2] = { CTL_KERN, KERN_FILE };
X struct xfile *files = NULL;
X pid_t pid = getpid();
X size_t fsize;
X int i, nfiles;
X
X if (lowfd < 0) {
X errno = EBADF;
X return (-1);
X }
X
X for (;;) {
X if (sysctl(mib, 2, files, &fsize, NULL, 0) == -1) {
X if (errno != ENOMEM)
X goto bad;
X else if (files != NULL) {
X free(files);
X files = NULL;
X }
X } else if (files == NULL) {
X files = (struct xfile *) malloc(fsize);
X if (files == NULL)
X return (-1);
X } else
X break;
X }
X
X /* XXX This structure may change */
X if (files->xf_size != sizeof(struct xfile) ||
X fsize % sizeof(struct xfile))
X {
X errno = ENOSYS;
X goto bad;
X }
X
X nfiles = fsize / sizeof(struct xfile);
X
X for (i = 0; i < nfiles; i++)
X if (files[i].xf_pid == pid && files[i].xf_fd >= lowfd)
X if (close(files[i].xf_fd) < 0 && errno == EINTR)
X goto bad;
X
X free(files);
X return (0);
X
Xbad:
X if (files != NULL) {
X int save_errno = errno;
X free(files);
X errno = save_errno;
X }
X return (-1);
X}
X
X/*
X * This version iterates over all possible file descriptors >= lowfd
X */
Xstatic int
Xclosefrom_brute(int lowfd)
X{
X int fd;
X
X if (lowfd < 0) {
X errno = EBADF;
X return (-1);
X }
X
X for (fd = getdtablesize(); fd >= lowfd; fd--)
X if (close(fd) < 0 && errno == EINTR)
X return (-1);
X
X return (0);
X}
X
X/*
X * An example implementation using /dev/fd (other systems use /proc/<pid>/fd)
X * Unfortunately, on FreeBSD, fdescf(5) doesn't include duplicated file
X * descriptors and sockets.
X */
Xstatic int
Xclosefrom_devfd(int lowfd)
X{
X struct dirent *d;
X DIR *dir;
X int fd;
X
X if (lowfd < 0) {
X errno = EBADF;
X return (-1);
X }
X
X /*
X * Close lowfd so we have a spare fd to use with /dev/fd
X */
X close(lowfd++);
X
X if ((dir = opendir("/dev/fd")) == NULL)
X return (-1);
X
X while ((d = readdir(dir)) != NULL) {
X#ifdef DEBUG
X printf("%s\n", d->d_name);
X#endif
X if (d->d_name[0] == '.')
X continue;
X fd = atoi(d->d_name);
X if (fd >= lowfd && fd != dirfd(dir))
X if (close(fd) < 0 && errno == EINTR)
X goto bad;
X }
X
X (void)closedir(dir);
X return (0);
X
Xbad:
X {
X int save_errno = errno;
X (void)closedir(dir);
X errno = save_errno;
X return (-1);
X }
X}
X
Xstatic void
Xtime_closefrom(int lowfd)
X{
X struct rusage ru, rux;
X struct timeval tv;
X double usecs, ssecs;
X
X if (getrusage(RUSAGE_SELF, &ru) < 0)
X err(1, "getrusage()");
X if (closefrom(lowfd) < 0)
X err(1, "closefrom()");
X if (getrusage(RUSAGE_SELF, &rux) < 0)
X err(1, "getrusage()");
X
X timersub(&rux.ru_utime, &ru.ru_utime, &tv);
X usecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X printf("user\t%f\t", usecs);
X timersub(&rux.ru_stime, &ru.ru_stime, &tv);
X ssecs = ((double)tv.tv_sec + (double)tv.tv_usec / 1000000);
X printf("sys\t%f\t", ssecs);
X usecs += ssecs;
X printf("total\t%f\n", usecs);
X}
X
Xstatic void
Xtry(int (*xclosefrom)(int), const char *str)
X{
X int fd, lowfd, maxfd;
X
X lowfd = dup(STDIN_FILENO);
X maxfd = getdtablesize();
X for (fd = 1; fd < maxfd; fd++)
X if (dup(STDIN_FILENO) < 0)
X break;
X
X closefrom = xclosefrom;
X printf("Trying %s(%d) with %d open file descriptors\n", str, lowfd, fd);
X time_closefrom(lowfd);
X
X printf("Trying %s(%d) with %d closed file descriptors\n", str, lowfd, fd);
X time_closefrom(lowfd);
X printf("\n");
X}
X
Xint test(int (*)(int));
X
Xint
Xmain(int argc, char *argv[])
X{
X if (argv[1] == NULL)
X usage(argv[0]);
X
X if (!strcmp(argv[1], "check")) {
X find_module();
X printf("testing closefrom_syscall():\t%s\n",
X test(&closefrom_syscall) ? "failed" : "ok");
X printf("testing closefrom_sysctl():\t%s\n",
X test(&closefrom_sysctl) ? "failed" : "ok");
X printf("testing closefrom_brute():\t%s\n",
X test(&closefrom_brute) ? "failed" : "ok");
X }
X else if (!strcmp(argv[1], "syscall")) {
X find_module();
X try(&closefrom_syscall, "closefrom_syscall");
X }
X else if (!strcmp(argv[1], "sysctl"))
X try(&closefrom_sysctl, "closefrom_sysctl");
X else if (!strcmp(argv[1], "devfd"))
X try(&closefrom_devfd, "closefrom_devfd");
X else if (!strcmp(argv[1], "brute"))
X try(&closefrom_brute, "closefrom_brute");
X else
X usage(argv[0]);
X
X return (0);
X}
X
X/*
X * NOTE:
X * The following code was adapted from OpenSSH's
X * openbsd-compat/regress/closefromtest.c
X */
X
X/*
X * Copyright (c) 2006 Darren Tucker
X *
X * Permission to use, copy, modify, and distribute this software for any
X * purpose with or without fee is hereby granted, provided that the above
X * copyright notice and this permission notice appear in all copies.
X *
X * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
X * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
X * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
X * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
X * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
X * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
X * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
X */
X
X#define NUM_OPENS 10
X
X#define fail(str) \
X do { printf("%s\n", (str)); \
X return -1; } while(0)
X
Xint
Xtest(int (*xclosefrom)(int))
X{
X int i, max, fds[NUM_OPENS];
X char buf[512];
X
X for (i = 0; i < NUM_OPENS; i++)
X if ((fds[i] = open("/dev/null", O_RDONLY)) == -1)
X exit(0); /* can't test */
X max = i - 1;
X
X /* should close last fd only */
X xclosefrom(fds[max]);
X if (close(fds[max]) != -1)
X fail("failed to close highest fd");
X
X /* make sure we can still use remaining descriptors */
X for (i = 0; i < max; i++)
X if (read(fds[i], buf, sizeof(buf)) == -1)
X fail("closed descriptors it should not have");
X
X /* should close all fds */
X xclosefrom(fds[0]);
X for (i = 0; i < NUM_OPENS; i++)
X if (close(fds[i]) != -1)
X fail("failed to close from lowest fd");
X
X return 0;
X}
SHAR_END
echo x closefrom/test/Makefile
sed 's/^X//' > closefrom/test/Makefile << 'SHAR_END'
XPROG = closefrom
XNO_MAN =
X
XCFLAGS = -Wall -O2
X
Xcheck: ${PROG}
X @./${PROG} check
X
Xtest: ${PROG}
X @./${PROG} syscall
X @./${PROG} sysctl
X @./${PROG} brute
X
X.include <bsd.prog.mk>
SHAR_END
echo x closefrom/module/Makefile
mkdir -p closefrom/module > /dev/null 2>&1
sed 's/^X//' > closefrom/module/Makefile << 'SHAR_END'
XKMOD = syscall
XSRCS = syscall.c vnode_if.h
X
XCFLAGS += -Wall
X
Xreload:
X @${MAKE} unload
X @${MAKE} load
X
X.include <bsd.kmod.mk>
SHAR_END
echo x closefrom/module/syscall.c
sed 's/^X//' > closefrom/module/syscall.c << 'SHAR_END'
X/*
X * Copyright (c) 2007 by Ighighi
X * All rights reserved.
X *
X * Redistribution and use in source and binary forms, with or without
X * modification, are permitted provided that the following conditions
X * are met:
X *
X * 1. Redistributions of source code must retain the above copyright
X * notice, this list of conditions and the following disclaimer.
X * 2. Redistributions in binary form must reproduce the above copyright
X * notice, this list of conditions and the following disclaimer in the
X * documentation and/or other materials provided with the distribution.
X *
X * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,
X * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
X * AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
X * THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
X * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
X * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
X * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
X * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
X * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
X * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
X */
X
X#include <sys/param.h>
X#include <sys/file.h>
X#include <sys/filedesc.h>
X#include <sys/kernel.h>
X#include <sys/proc.h>
X#include <sys/syscallsubr.h>
X#include <sys/sysent.h>
X#include <sys/systm.h>
X#include <sys/vnode.h>
X#include <sys/module.h>
X
X/*
X * Newer code in FreeBSD > 6.2 use shared/exclusive locks
X */
X#ifndef FILEDESC_SLOCK
X#define FILEDESC_SLOCK FILEDESC_LOCK_FAST
X#define FILEDESC_SUNLOCK FILEDESC_UNLOCK_FAST
X#endif
X
X/*
X * kern_closefrom()
X */
Xstatic int
Xkern_closefrom(struct thread *td, int lowfd)
X{
X struct filedesc *fdp;
X int fd;
X
X /*
X * Note: NetBSD uses EBADF and Dragonly uses (undocumented) EINVAL
X */
X if (lowfd < 0)
X return (EBADF);
X
X fdp = td->td_proc->p_fd;
X
X FILEDESC_SLOCK(fdp);
X while ((fd = fdp->fd_lastfile) >= lowfd) {
X FILEDESC_SUNLOCK(fdp);
X if (kern_close(td, fd) == EINTR)
X return (EINTR);
X FILEDESC_SLOCK(fdp);
X }
X FILEDESC_SUNLOCK(fdp);
X
X return (0);
X}
X
X/* closefrom() arguments */
Xstruct closefrom_args {
X int fd;
X};
X
Xstatic int
Xclosefrom(struct thread *td, void *args)
X{
X struct closefrom_args *uap = (struct closefrom_args *)args;
X
X return (kern_closefrom(td, uap->fd));
X}
X
X/* closefrom() sysent[] */
Xstatic struct sysent closefrom_sysent = {
X 1, /* number of arguments */
X closefrom /* implementing function */
X};
X
X/*
X * LKM stuff
X */
X
X/* offset in sysent[] where the syscall will be allocated */
Xstatic int offset = NO_SYSCALL;
X
Xstatic int
Xload(struct module *module, int cmd, void *arg)
X{
X int error = 0;
X
X switch (cmd) {
X case MOD_LOAD:
X uprintf("closefrom loaded at offset %d\n", offset);
X break;
X
X case MOD_UNLOAD:
X uprintf("closefrom unloaded from offset %d\n", offset);
X break;
X
X default:
X error = EOPNOTSUPP;
X break;
X }
X
X return (error);
X}
X
XSYSCALL_MODULE(closefrom, &offset, &closefrom_sysent, load, NULL);
SHAR_END
exit
More information about the freebsd-hackers
mailing list