Complicated interactions between O_EXEC, fdescfs, fexecve, and shebangs

From: Drew DeVault <sir_at_cmpwn.com>
Date: Wed, 03 Nov 2021 11:30:26 UTC
Note: I am not subscribed to this list, please use reply-all to keep me
on the thread. Thanks!

$ uname -a
FreeBSD megumin 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr  9 04:24:09 UTC 2021 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64

This problem starts with the following program:

#include <fcntl.h>
#include <unistd.h>

extern char **environ;

int main(void) {
	int fd = open("./test.sh", O_EXEC);
	char *argv[] = {
		"./test.sh",
		NULL
	};
	fexecve(fd, argv, environ);
}

Given this test.sh, which is executable:

#!/bin/sh
echo hello world

This program produces the following error:

/bin/sh: cannot open /dev/fd/3: Permission denied

The program works fine with O_RDONLY instead, which makes some sense.
The way this works is that the kernel rewrites argv to {"/bin/sh",
"/dev/fd/%d"}, where %d is the file descriptor passed to fexecat. The
interpreter then has to open this file for reading, so it needs the read
bit set. fdescfs preserves the permissions of the file descriptor which
was originally opened, so the read bit is missing with O_EXEC. Q.E.D.

The fix is to set O_RDONLY and mount fdescfs. If nothing else comes of
this, I would like to request that FreeBSD consider mounting fdescfs by
default, so that fexecve can be reliably expected to work correctly with
interpreters. Otherwise, the value proposition of fexecve is severely
limited.

However, a few other problems came up while looking into this.

The investigation was made more difficult by the fact that open(2) is
documented in the man page as producing EINVAL when O_EXEC is combined
with O_RDONLY, but this is not so: no error occurs. This is because
O_RDONLY is, in fact, not a bit: it is zero. You cannot NOT provide
O_RDONLY to an open call. RhodiumToad on #freebsd IRC gave a possible
improvement for the man page:

> Only one of O_EXEC, O_RDWR and O_WRONLY may be specified.

The other issue is that this essentially makes O_EXEC useless outside of
some specific cases, where the user knows for certain that the file
being executed is not a script. The combination of O_EXEC and fexecve
cannot generalize to support all use-cases of execve, which is
frustrating because my code either (A) cannot be TOCTOU or (B) needs
some awful special cases. Even in case (B), it would not generalize to
the case where I have execute, but not read, permission for a script,
but the interpreter has both.

I'm not sure what the answer for any of this is.

By way of contrast, Linux solves this problem a bit differently. It does
not have O_EXEC, but it does have O_PATH, which opens a file descriptor
without read, write, OR execute, but simply to keep track of an inode
reference. fexecve on Linux then uses a similar /dev/fd trick, but the
file in /dev/fd has no mode bits set and I'm not sure why it works.