Re: Complicated interactions between O_EXEC, fdescfs, fexecve, and shebangs

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Wed, 03 Nov 2021 13:35:40 UTC
On Wed, Nov 03, 2021 at 12:30:26PM +0100, Drew DeVault wrote:
> Note: I am not subscribed to this list, please use reply-all to keep me
> on the thread. Thanks!
> 
> $ uname -a
> FreeBSD megumin 13.0-RELEASE FreeBSD 13.0-RELEASE #0 releng/13.0-n244733-ea31abc261f: Fri Apr  9 04:24:09 UTC 2021 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
> 
> This problem starts with the following program:
> 
> #include <fcntl.h>
> #include <unistd.h>
> 
> extern char **environ;
> 
> int main(void) {
> 	int fd = open("./test.sh", O_EXEC);
> 	char *argv[] = {
> 		"./test.sh",
> 		NULL
> 	};
> 	fexecve(fd, argv, environ);
> }
> 
> Given this test.sh, which is executable:
> 
> #!/bin/sh
> echo hello world
> 
> This program produces the following error:
> 
> /bin/sh: cannot open /dev/fd/3: Permission denied
> 
> The program works fine with O_RDONLY instead, which makes some sense.
> The way this works is that the kernel rewrites argv to {"/bin/sh",
> "/dev/fd/%d"}, where %d is the file descriptor passed to fexecat. The
> interpreter then has to open this file for reading, so it needs the read
> bit set. fdescfs preserves the permissions of the file descriptor which
> was originally opened, so the read bit is missing with O_EXEC. Q.E.D.
> 
> The fix is to set O_RDONLY and mount fdescfs. If nothing else comes of
> this, I would like to request that FreeBSD consider mounting fdescfs by
> default, so that fexecve can be reliably expected to work correctly with
> interpreters. Otherwise, the value proposition of fexecve is severely
> limited.
> 
> However, a few other problems came up while looking into this.
> 
> The investigation was made more difficult by the fact that open(2) is
> documented in the man page as producing EINVAL when O_EXEC is combined
> with O_RDONLY, but this is not so: no error occurs. This is because
> O_RDONLY is, in fact, not a bit: it is zero. You cannot NOT provide
> O_RDONLY to an open call. RhodiumToad on #freebsd IRC gave a possible
> improvement for the man page:
> 
> > Only one of O_EXEC, O_RDWR and O_WRONLY may be specified.
> 
> The other issue is that this essentially makes O_EXEC useless outside of
> some specific cases, where the user knows for certain that the file
> being executed is not a script. The combination of O_EXEC and fexecve
> cannot generalize to support all use-cases of execve, which is
> frustrating because my code either (A) cannot be TOCTOU or (B) needs
> some awful special cases. Even in case (B), it would not generalize to
> the case where I have execute, but not read, permission for a script,
> but the interpreter has both.
> 
> I'm not sure what the answer for any of this is.
> 
> By way of contrast, Linux solves this problem a bit differently. It does
> not have O_EXEC, but it does have O_PATH, which opens a file descriptor
> without read, write, OR execute, but simply to keep track of an inode
> reference. fexecve on Linux then uses a similar /dev/fd trick, but the
> file in /dev/fd has no mode bits set and I'm not sure why it works.

FreeBSD also has O_PATH.  There are two differences with Linux:
- FreeBSD requires O_EXEC to be specified together with O_PATH, if intent
  is to use the resulting file descriptor with fexecve(2).  In fact this
  can be removed, see https://reviews.freebsd.org/D32821
- Semantic of the FreeBSD fdescfs open(2) is different, to get the behavior
  similar to Linux, you need to specify "nodup" mount option, see fdescfs(5).