Re: NFS server credentials with cr_ngroups == 0

From: Olivier Certner <olce_at_freebsd.org>
Date: Tue, 15 Oct 2024 10:07:27 UTC
Hello,

> olce@ reported an issue (...)

In addition to that, I've been proposing a way to solve the noticed problems (among others) in some revision series, from D46914 to D46918, which is summarized below.

> Note that these credentials are not POSIX syscall ones.
> They are used specifically by the NFS server for file access.

The current situation can be seen as stemming from conflation between full process credentials and those that are needed solely to perform file access checks (which are a subset of the previous).  This conflation was introduced as soon as 2001 (commit b1fc0ec1a7a49dede, r77183) with as far as I can tell the goal of code simplification (a part of which was some locking avoidance) and consistency of checks, which are undeniably important benefits.  Its drawbacks are unnecessary information passing in some cases, and generally the loss of context on the credentials use (for callers) and provenance (for callees).

> The not so good news is that commit 7f92e57 (Jun 20, 2009)
> broke groupmember() for the case where cr_ngroups == 0,
> assuming there would always be at least one group (cr_groups[0]
> or cr_gid, if you prefer).

Because process credentials are always supposed to have an effective GID (and a real and saved one), this assumption was quite natural.  It is more largely a symptom that kernel code has been for a long time written with the "`cr_ngroups[]` member of 'struct cred' has at least one element" assumption in mind.

This is the reason why I initially chose, in the above-mentioned series of revisions, to fix the current problem of uninitialized effective GID in NFS by enforcing a default value for it when it is not explicitly specified.

Some "natural" value for the fallback group has existed since at least 4.4BSD in the form of the "nogroup" group, which incidentally NFS has been using for very long.  In some subsequent patch series (see in particular D47011), the fallback effective GID is obtained by a lookup of "nogroup" in the group database, and if that fails the constant GID_NOGROUP is used.

The drawback here is that "nogroup" is in fact just a regular group, with the consequence that defaulting to "nogroup" would give access to files having "nogroup" as their group.  This is arguably a strange situation that should barely happen in practice, but it remains true that defaulting to "nogroup" is not exactly the same as supporting true NFS export credentials with no groups.

> So, what should we do about this?
> 
> #1 A simple patch can be applied to groupmember() and a couple
>      of places in the NFS server code, so that cr_ngroups == 0 again
>      works correctly for this case.
> #2 Decide that cr_ngroups == 0 should no longer be supported and
>      patch accordingly.
> OR ???
> 
> Personally, I am thinking that #1 should be done right away and
> MFC'd to stable/14 and stable/13 so that the currently documented
> behaviour is supported for FreeBSD 13 and FreeBSD14.
> (To do otherwise would seem to be a POLA violation to me.)

#1 fixes the immediately discoverable NFS problems, but does it fix all possible ones?  Also, it is only partial, as the effective GID is not the only field to fill in credentials: Real and saved ones should be as well, and these fields are not optional (contrary to the effective GID which occupies a slot in an array whose size can be reduced).  I can buy that these credentials are intended to be used only for file access checks where these fields are supposedly not used, but after having spent a few days tracking the flow of credentials in most of NFS components, I can say this is relatively hard to verify, and almost impossible to prove (given that, e.g., any filesystem is in theory free to implement the access checks it wants, and maybe or maybe not use the effective, real or saved GIDs in the process; and there are no guarantees that changes in NFS or elsewhere would suddenly not cause these credentials to be used in other contexts).

If #1 is "simple" in terms of number of lines of change, I think it is foremost risky given the history of treatment for 'struct ucred'.  In turn, #2 is simple as well, as can be seen in the series D46914 to D46918, and does not have the associated risk.

So, in the immediate, I would rather commit to #2 and the already proposed code, and MFC it to stable/13 and stable/14, to fix the current uninitialized reads triggered by NFS.

I don't think the POLA argument has much wait given that the current problems date back to 2009, and don't think it necessarily hints at #1 either since the use of a fallback group is currently the norm in several places in NFS.

We can later discuss on whether we should pursue supporting credentials with no groups, which #1 is only a small part of.  Doing so would allow credentials without groups in NFS export, but has the drawback of context conflation and associated risk.  I'm not sure if it is worth it, but do not have a strong opinion about it.  However, if we go down this path, I think at the very least we need to add accessors to GIDs in 'struct ucred' implementing sanity checks (were groups actually filled?), removing most direct accesses in the process, and add checks to some critical pieces of code where there is enough context to know that groups must have been filled correctly (e.g., proc_set_cred()), to counter the effect of increased conflation.  I have started to explore this route: It doesn't seem especially difficult but entails several changes in different, unrelated places, an additional reason for me to recommend avoiding it as a stopgap measure.

Thanks and regards.

-- 
Olivier Certner