New in-kernel privilege API: priv(9)

Tue Oct 31 15:05:12 UTC 2006

On Tue, 31 Oct 2006, Pawel Jakub Dawidek wrote:

> Here are few nits I found:

Thanks!  Comments below.

>> Index: sys/fs/hpfs/hpfs_vnops.c
>> ===================================================================
>> RCS file: /zoo/cvsup/FreeBSD-CVS/src/sys/fs/hpfs/hpfs_vnops.c,v
>> retrieving revision 1.68
>> diff -u -r1.68 hpfs_vnops.c
>> --- sys/fs/hpfs/hpfs_vnops.c	17 Jan 2006 17:29:01 -0000	1.68
>> +++ sys/fs/hpfs/hpfs_vnops.c	30 Oct 2006 17:07:55 -0000
>> @@ -501,11 +501,12 @@
>>  	if (vap->va_atime.tv_sec != VNOVAL || vap->va_mtime.tv_sec != VNOVAL) {
>>  		if (vp->v_mount->mnt_flag & MNT_RDONLY)
>>  			return (EROFS);
>> -		if (cred->cr_uid != hp->h_uid &&
>> -		    (error = suser_cred(cred, SUSER_ALLOWJAIL)) &&
>> -		    ((vap->va_vaflags & VA_UTIMES_NULL) == 0 ||
>> -		    (error = VOP_ACCESS(vp, VWRITE, cred, td))))
>> -			return (error);
>> +		if (vap->va_vaflags & VA_UTIMES_NULL) {
>> +			error = VOP_ACCESS(vp, VADMIN, cred, td);
>> +			if (error)
>> +				error = VOP_ACCESS(vp, VWRITE, cred, td);
>> +		} else
>> +			error = VOP_ACCESS(vp, VADMIN, cred, td);
>
> Eliminating privilege check here was intentional? Doesn't it change 
> functionality? I found the same check in few other places, so it's probably 
> intentional, but worth checking still.

Yeah, it took my quite a while to puzzle through what the security check here 
is supposed to accomplish, but once I did, the correct structure became more 
clear.  The key here is that VOP_ACCESS() will also perform a privilege check, 
so the use of privilege in the existing code appears to be redundant.  The 
logic here should essentially read:

- To update the time stamp to the current time (VA_UTIMES_NULL), you must
   either be the owner or have write access.  The correct error value is
   EACCES, so we try the write check only if the owner check fails, so that the
   error from the write check overrides the owner result.

- To update the time stamp to any other time, you must be the owner.  The
   error here is EPERM on failure.

Given this description, do you think the change is right?

>> --- sys/fs/msdosfs/msdosfs_vfsops.c	26 Sep 2006 04:12:45 -0000	1.153
>> +++ sys/fs/msdosfs/msdosfs_vfsops.c	30 Oct 2006 17:07:55 -0000
> [...]
>> @@ -293,17 +294,17 @@
>>  			 * If upgrade to read-write by non-root, then verify
>>  			 * that user has necessary permissions on the device.
>>  			 */
>> -			if (suser(td)) {
>> -				devvp = pmp->pm_devvp;
>> -				vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
>> -				error = VOP_ACCESS(devvp, VREAD | VWRITE,
>> -						   td->td_ucred, td);
>> -				if (error) {
>> -					VOP_UNLOCK(devvp, 0, td);
>> -					return (error);
>> -				}
>> +			devvp = pmp->pm_devvp;
>> +			vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
>> +			error = VOP_ACCESS(devvp, VREAD | VWRITE,
>> +			   td->td_ucred, td);
>> +			if (error)
>> +				error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>> +			if (error) {
>>  				VOP_UNLOCK(devvp, 0, td);
>> +				return (error);
>>  			}
>> +			VOP_UNLOCK(devvp, 0, td);
>
> This doesn't seem right to me. If VOP_ACCESS() returns an error if call 
> priv_check(), I think you wanted to call it when it return success, no?
>
> I'd change it to:
>
> 			devvp = pmp->pm_devvp;
> 			vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
> 			error = VOP_ACCESS(devvp, VREAD | VWRITE,
> 			   td->td_ucred, td);
> 			VOP_UNLOCK(devvp, 0, td);
> 			if (!error)
> 				error = priv_check(td, PRIV_VFS_MOUNT_PERM);
> 			if (error)
> 				return (error);

Again, this is a tricky case in the original code.  I believe the intended 
logic here, in English, is:

- In order to mount on a device vnode, you must be able to read and write to
   it.  This is subject to the normal definition of read and write privilege,
   so the superuser can override on that basis (as part of VOP_ACCESS).

- However, if you don't have read and write access, you can also use privilege
   (PRIV_VFS_MOUNT_PERM) to override that.

This is expressed in the original code by first checking for privilege, and if 
that fails, then looking for read/write access.  This doesn't make a lot of 
sense because the read/write check is also exempted by privilege, but was 
probably "a little faster" because suser() was cheaper than VOP_ACCESS().

The reason we call priv_check() in the error case is that we only want to 
check for privilege if the normal access control check fails, in which case we 
override the normal access control error check with the result of the 
privilege check.

Given this description, does the new code still make sense?

>> @@ -353,15 +354,15 @@
>>  	 * If mount by non-root, then verify that user has necessary
>>  	 * permissions on the device.
>>  	 */
>> -	if (suser(td)) {
>> -		accessmode = VREAD;
>> -		if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> -			accessmode |= VWRITE;
>> -		error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td);
>> -		if (error) {
>> -			vput(devvp);
>> -			return (error);
>> -		}
>> +	accessmode = VREAD;
>> +	if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> +		accessmode |= VWRITE;
>> +	error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td);
>> +	if (error)
>> +		error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>> +	if (error) {
>> +		vput(devvp);
>> +		return (error);
>
> Same here.

Ditto, although the underlying DAC logic here is a little different.  In this 
file system, we only require write access if it's a non-readonbly mount.  If 
VOP_ACCESS() fails, we fall back on privilege.

>> --- sys/fs/umapfs/umap_vfsops.c	26 Sep 2006 04:12:46 -0000	1.65
>> +++ sys/fs/umapfs/umap_vfsops.c	30 Oct 2006 17:07:55 -0000
>> @@ -88,8 +88,9 @@
>>  	/*
>>  	 * Only for root
>>  	 */
>> -	if ((error = suser(td)) != 0)
>> -		return (error);
>> +	error = priv_check(td, PRIV_VFS_MOUNT);
>> +	if (error)
>> +		return (ERROR);
>
> s/ERROR/error/

Err.  That's odd.  That should have been caught in build tests, maybe an edit 
error.  I will fix, thanks!

>
>> --- sys/gnu/fs/ext2fs/ext2_vfsops.c	26 Sep 2006 04:12:47 -0000	1.158
>> +++ sys/gnu/fs/ext2fs/ext2_vfsops.c	30 Oct 2006 17:07:55 -0000
> [...]
>> @@ -197,15 +198,16 @@
>>  			 * If upgrade to read-write by non-root, then verify
>>  			 * that user has necessary permissions on the device.
>>  			 */
>> -			if (suser(td)) {
>> -				vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
>> -				if ((error = VOP_ACCESS(devvp, VREAD | VWRITE,
>> -				    td->td_ucred, td)) != 0) {
>> -					VOP_UNLOCK(devvp, 0, td);
>> -					return (error);
>> -				}
>> +			vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
>> +			error = VOP_ACCESS(devvp, VREAD | VWRITE,
>> +			    td->td_ucred, td);
>> +			if (error)
>> +				error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>
> Another s/if (error)/if (!error)/

I think this is right -- we want to do the privilege check in the event the 
normal check fails, since we're granting extra rights, rather than mask the 
whole thing by result.  Notice that this is not the privilege to mount, it's 
the privilege to override device node protections.

>> @@ -259,15 +261,18 @@
>>  	/*
>>  	 * If mount by non-root, then verify that user has necessary
>>  	 * permissions on the device.
>> +	 *
>> +	 * XXXRW: VOP_ACCESS() enough?
>>  	 */
>> -	if (suser(td)) {
>> -		accessmode = VREAD;
>> -		if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> -			accessmode |= VWRITE;
>> -		if ((error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td)) != 0) {
>> -			vput(devvp);
>> -			return (error);
>> -		}
>> +	accessmode = VREAD;
>> +	if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> +		accessmode |= VWRITE;
>> +	error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td);
>> +	if (error)
>> +		error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>
> And again.

Think this one is right also.  If I'm thinking about this the wrong way, you 
should tell me :-)

>
>> --- sys/gnu/fs/reiserfs/reiserfs_vfsops.c	26 Sep 2006 04:12:47 -0000	1.6
>> +++ sys/gnu/fs/reiserfs/reiserfs_vfsops.c	30 Oct 2006 17:07:55 -0000
>> @@ -125,15 +125,15 @@
>>
>>  	/* If mount by non-root, then verify that user has necessary
>>  	 * permissions on the device. */
>> -	if (suser(td)) {
>> -		accessmode = VREAD;
>> -		if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> -			accessmode |= VWRITE;
>> -		if ((error = VOP_ACCESS(devvp,
>> -		    accessmode, td->td_ucred, td)) != 0) {
>> -			vput(devvp);
>> -			return (error);
>> -		}
>> +	accessmode = VREAD;
>> +	if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> +		accessmode |= VWRITE;
>> +	error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td);
>> +	if (error)
>> +		error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>
> One more.
>
>> --- sys/gnu/fs/xfs/FreeBSD/xfs_super.c	10 Jun 2006 19:02:13 -0000	1.4
>> +++ sys/gnu/fs/xfs/FreeBSD/xfs_super.c	30 Oct 2006 17:07:55 -0000
> [...]
>> @@ -149,14 +151,15 @@
>>  	vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
>>
>>  	ronly = ((XFS_MTOVFS(mp)->vfs_flag & VFS_RDONLY) != 0);
>> -	if (suser(td)) {
>> -		accessmode = VREAD;
>> -		if (!ronly)
>> -			accessmode |= VWRITE;
>> -		if ((error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td))!= 0){
>> -			vput(devvp);
>> -			return (error);
>> -		}
>> +	accessmode = VREAD;
>> +	if (!ronly)
>> +		accessmode |= VWRITE;
>> +	error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td);
>> +	if (error)
>> +		error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>
> Another one.
>
>> --- sys/kern/kern_jail.c	22 Oct 2006 11:52:13 -0000	1.53
>> +++ sys/kern/kern_jail.c	30 Oct 2006 17:07:55 -0000
> [...]
>> @@ -523,6 +524,172 @@
>>  	}
>>  }
>>
>> +/*
>> + * Check with permission for a specific privilege is granted within jail.  We
>> + * have a specific list of accepted privileges; the rest are denied.
>> + */
>> +int
>> +prison_priv_check(struct ucred *cred, int priv)
>> +{
>> +
>> +	if (!(jailed(cred)))
>> +		return (0);
>
> Style: if (!jailed(cred))

Fixed, thanks!

>> --- sys/netatm/atm_usrreq.c	21 Jul 2006 17:11:13 -0000	1.27
>> +++ sys/netatm/atm_usrreq.c	30 Oct 2006 17:07:55 -0000
>> -		if (td && (suser(td) != 0))
>> -			ATM_RETERR(EPERM);
>> +		if (td != 0) {
>
> Style: s/0/NULL/

Also now fixed.

>> --- sys/netinet6/udp6_usrreq.c	7 Sep 2006 18:44:54 -0000	1.68
>> +++ sys/netinet6/udp6_usrreq.c	30 Oct 2006 17:07:56 -0000
> [...]
>> @@ -434,7 +435,8 @@
>>  	struct inpcb *inp;
>>  	int error;
>>
>> -	error = suser(req->td);
>> +	error = priv_check_cred(req->td->td_ucred, PRIV_NETINET_GETCRED,
>> +	    SUSER_ALLOWJAIL);
>
> Change from not allowing jailed root to allowing jailed root was
> intentional?

Yes -- I believe that we already perform any necessary checks as part of the 
socket visibility check later on in the function.  I'm not sure duplicating it 
is entirely bad, but I think it's probably better this way, especially if we 
plan to support IPv6 in jail in the future.

>> --- sys/ufs/ffs/ffs_vfsops.c	22 Oct 2006 11:52:19 -0000	1.321
>> +++ sys/ufs/ffs/ffs_vfsops.c	30 Oct 2006 17:07:56 -0000
> [...]
>> @@ -257,15 +258,16 @@
>>  			 * If upgrade to read-write by non-root, then verify
>>  			 * that user has necessary permissions on the device.
>>  			 */
>> -			if (suser(td)) {
>> -				vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
>> -				if ((error = VOP_ACCESS(devvp, VREAD | VWRITE,
>> -				    td->td_ucred, td)) != 0) {
>> -					VOP_UNLOCK(devvp, 0, td);
>> -					return (error);
>> -				}
>> +			vn_lock(devvp, LK_EXCLUSIVE | LK_RETRY, td);
>> +			error = VOP_ACCESS(devvp, VREAD | VWRITE,
>> +			    td->td_ucred, td);
>> +			if (error)
>> +				error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>
> s/if (error)/if (!error)/

Another case of my thinking this is correct.  Notice that the logic originally 
there is on the same lines: only if the superuser check fails do we do the 
discretionary check.  Now we always do the discretionary check, and sometimes 
do the superuser check, but the idea is that either succeeding is sufficient 
to allo wit to pass.

>> @@ -364,14 +366,15 @@
>>  	 * If mount by non-root, then verify that user has necessary
>>  	 * permissions on the device.
>>  	 */
>> -	if (suser(td)) {
>> -		accessmode = VREAD;
>> -		if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> -			accessmode |= VWRITE;
>> -		if ((error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td))!= 0){
>> -			vput(devvp);
>> -			return (error);
>> -		}
>> +	accessmode = VREAD;
>> +	if ((mp->mnt_flag & MNT_RDONLY) == 0)
>> +		accessmode |= VWRITE;
>> +	error = VOP_ACCESS(devvp, accessmode, td->td_ucred, td);
>> +	if (error)
>> +		error = priv_check(td, PRIV_VFS_MOUNT_PERM);
>
> s/if (error)/if (!error)/

Ditto.

If you disagree with my reasoning above, do let me know -- I scratched my head 
in a number of places in the file system code trying to decide the intent of 
the checks, and changed the current expression in order to provide a more 
logical expression of the intent in order to change the privilege behavior.

Thanks!

Robert N M Watson
Computer Laboratory
University of Cambridge