git: ddb3eb4efe55 - main - New setcred() system call and associated MAC hooks

From: Olivier Certner <olce_at_FreeBSD.org>
Date: Mon, 16 Dec 2024 14:46:09 UTC
The branch main has been updated by olce:

URL: https://cgit.FreeBSD.org/src/commit/?id=ddb3eb4efe55e57c206f3534263c77b837aff1dc

commit ddb3eb4efe55e57c206f3534263c77b837aff1dc
Author:     Olivier Certner <olce@FreeBSD.org>
AuthorDate: 2024-07-18 20:47:43 +0000
Commit:     Olivier Certner <olce@FreeBSD.org>
CommitDate: 2024-12-16 14:42:39 +0000

    New setcred() system call and associated MAC hooks
    
    This new system call allows to set all necessary credentials of
    a process in one go: Effective, real and saved UIDs, effective, real and
    saved GIDs, supplementary groups and the MAC label.  Its advantage over
    standard credential-setting system calls (such as setuid(), seteuid(),
    etc.) is that it enables MAC modules, such as MAC/do, to restrict the
    set of credentials some process may gain in a fine-grained manner.
    
    Traditionally, credential changes rely on setuid binaries that call
    multiple credential system calls and in a specific order (setuid() must
    be last, so as to remain root for all other credential-setting calls,
    which would otherwise fail with insufficient privileges).  This
    piecewise approach causes the process to transiently hold credentials
    that are neither the original nor the final ones.  For the kernel to
    enforce that only certain transitions of credentials are allowed, either
    these possibly non-compliant transient states have to disappear (by
    setting all relevant attributes in one go), or the kernel must delay
    setting or checking the new credentials.  Delaying setting credentials
    could be done, e.g., by having some mode where the standard system calls
    contribute to building new credentials but without committing them.  It
    could be started and ended by a special system call.  Delaying checking
    could mean that, e.g., the kernel only verifies the credentials
    transition at the next non-credential-setting system call (we just
    mention this possibility for completeness, but are certainly not
    endorsing it).
    
    We chose the simpler approach of a new system call, as we don't expect
    the set of credentials one can set to change often.  It has the
    advantages that the traditional system calls' code doesn't have to be
    changed and that we can establish a special MAC protocol for it, by
    having some cleanup function called just before returning (this is
    a requirement for MAC/do), without disturbing the existing ones.
    
    The mac_cred_check_setcred() hook is passed the flags received by
    setcred() (including the version) and both the old and new kernel's
    'struct ucred' instead of 'struct setcred' as this should simplify
    evolving existing hooks as the 'struct setcred' structure evolves.  The
    mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
    called by pairs around potential calls to mac_cred_check_setcred().
    They allow MAC modules to allocate/free data they may need in their
    mac_cred_check_setcred() hook, as the latter is called under the current
    process' lock, rendering sleepable allocations impossible.  MAC/do is
    going to leverage these in a subsequent commit.  A scheme where
    mac_cred_check_setcred() could return ERESTART was considered but is
    incompatible with proper composition of MAC modules.
    
    While here, add missing includes and declarations for standalone
    inclusion of <sys/ucred.h> both from kernel and userspace (for the
    latter, it has been working thanks to <bsm/audit.h> already including
    <sys/types.h>).
    
    Reviewed by:    brooks
    Approved by:    markj (mentor)
    Relnotes:       yes
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D47618
---
 lib/libsys/Symbol.sys.map                      |   1 +
 lib/libsys/_libsys.h                           |   3 +
 lib/libsys/syscalls.map                        |   2 +
 sys/bsm/audit_kevents.h                        |   1 +
 sys/compat/freebsd32/freebsd32_misc.c          |   9 +
 sys/compat/freebsd32/freebsd32_proto.h         |   7 +
 sys/compat/freebsd32/freebsd32_syscall.h       |   3 +-
 sys/compat/freebsd32/freebsd32_syscalls.c      |   1 +
 sys/compat/freebsd32/freebsd32_sysent.c        |   1 +
 sys/compat/freebsd32/freebsd32_systrace_args.c |  30 ++
 sys/kern/init_sysent.c                         |   1 +
 sys/kern/kern_jail.c                           |   1 +
 sys/kern/kern_prot.c                           | 373 ++++++++++++++++++++++++-
 sys/kern/syscalls.c                            |   1 +
 sys/kern/syscalls.master                       |   7 +
 sys/kern/systrace_args.c                       |  30 ++
 sys/security/mac/mac_cred.c                    |  47 ++++
 sys/security/mac/mac_framework.h               |   6 +-
 sys/security/mac/mac_policy.h                  |  10 +-
 sys/security/mac_stub/mac_stub.c               |  20 ++
 sys/security/mac_test/mac_test.c               |  29 ++
 sys/sys/priv.h                                 |   3 +-
 sys/sys/syscall.h                              |   3 +-
 sys/sys/syscall.mk                             |   3 +-
 sys/sys/syscallsubr.h                          |   2 +
 sys/sys/sysproto.h                             |   7 +
 sys/sys/ucred.h                                |  80 +++++-
 27 files changed, 663 insertions(+), 18 deletions(-)

diff --git a/lib/libsys/Symbol.sys.map b/lib/libsys/Symbol.sys.map
index 3e2f14497b07..8b5330cbdb87 100644
--- a/lib/libsys/Symbol.sys.map
+++ b/lib/libsys/Symbol.sys.map
@@ -381,6 +381,7 @@ FBSD_1.8 {
 	fchroot;
 	getrlimitusage;
 	kcmp;
+	setcred;
 };
 
 FBSDprivate_1.0 {
diff --git a/lib/libsys/_libsys.h b/lib/libsys/_libsys.h
index 894f49185fbc..033ee27f8a19 100644
--- a/lib/libsys/_libsys.h
+++ b/lib/libsys/_libsys.h
@@ -46,6 +46,7 @@ struct rusage;
 struct sched_param;
 struct sctp_sndrcvinfo;
 struct sembuf;
+struct setcred;
 struct sf_hdtr;
 struct shmid_ds;
 struct sigaction;
@@ -464,6 +465,7 @@ typedef int (__sys_timerfd_settime_t)(int, int, const struct itimerspec *, struc
 typedef int (__sys_kcmp_t)(pid_t, pid_t, int, uintptr_t, uintptr_t);
 typedef int (__sys_getrlimitusage_t)(u_int, int, rlim_t *);
 typedef int (__sys_fchroot_t)(int);
+typedef int (__sys_setcred_t)(u_int, const struct setcred *, size_t);
 
 void __sys_exit(int rval);
 int __sys_fork(void);
@@ -865,6 +867,7 @@ int __sys_timerfd_settime(int fd, int flags, const struct itimerspec * new_value
 int __sys_kcmp(pid_t pid1, pid_t pid2, int type, uintptr_t idx1, uintptr_t idx2);
 int __sys_getrlimitusage(u_int which, int flags, rlim_t * res);
 int __sys_fchroot(int fd);
+int __sys_setcred(u_int flags, const struct setcred * wcred, size_t size);
 __END_DECLS
 
 #endif /* __LIBSYS_H_ */
diff --git a/lib/libsys/syscalls.map b/lib/libsys/syscalls.map
index 9e748c659c46..cad6e3ff4132 100644
--- a/lib/libsys/syscalls.map
+++ b/lib/libsys/syscalls.map
@@ -807,4 +807,6 @@ FBSDprivate_1.0 {
 	__sys_getrlimitusage;
 	_fchroot;
 	__sys_fchroot;
+	_setcred;
+	__sys_setcred;
 };
diff --git a/sys/bsm/audit_kevents.h b/sys/bsm/audit_kevents.h
index d06381837aad..0f110d5f9ddd 100644
--- a/sys/bsm/audit_kevents.h
+++ b/sys/bsm/audit_kevents.h
@@ -662,6 +662,7 @@
 #define	AUE_AIO_READV		43268	/* FreeBSD-specific. */
 #define	AUE_FSPACECTL		43269	/* FreeBSD-specific. */
 #define	AUE_TIMERFD		43270	/* FreeBSD/Linux. */
+#define	AUE_SETCRED		43271	/* FreeBSD-specific. */
 
 /*
  * Darwin BSM uses a number of AUE_O_* definitions, which are aliased to the
diff --git a/sys/compat/freebsd32/freebsd32_misc.c b/sys/compat/freebsd32/freebsd32_misc.c
index 67ebb5d52589..4cd706e16155 100644
--- a/sys/compat/freebsd32/freebsd32_misc.c
+++ b/sys/compat/freebsd32/freebsd32_misc.c
@@ -86,6 +86,7 @@
 #include <sys/timex.h>
 #include <sys/unistd.h>
 #include <sys/ucontext.h>
+#include <sys/ucred.h>
 #include <sys/vnode.h>
 #include <sys/wait.h>
 #include <sys/ipc.h>
@@ -115,6 +116,7 @@
 #endif
 
 #include <security/audit/audit.h>
+#include <security/mac/mac_syscalls.h>
 
 #include <compat/freebsd32/freebsd32_util.h>
 #include <compat/freebsd32/freebsd32.h>
@@ -4174,3 +4176,10 @@ ofreebsd32_sethostid(struct thread *td, struct ofreebsd32_sethostid_args *uap)
 	    sizeof(hostid), NULL, 0));
 }
 #endif
+
+int
+freebsd32_setcred(struct thread *td, struct freebsd32_setcred_args *uap)
+{
+	/* Last argument is 'is_32bit'. */
+	return (user_setcred(td, uap->flags, uap->wcred, uap->size, true));
+}
diff --git a/sys/compat/freebsd32/freebsd32_proto.h b/sys/compat/freebsd32/freebsd32_proto.h
index cbb95f2b835b..ee634943a4f5 100644
--- a/sys/compat/freebsd32/freebsd32_proto.h
+++ b/sys/compat/freebsd32/freebsd32_proto.h
@@ -694,6 +694,11 @@ struct freebsd32_timerfd_settime_args {
 	char new_value_l_[PADL_(const struct itimerspec32 *)]; const struct itimerspec32 * new_value; char new_value_r_[PADR_(const struct itimerspec32 *)];
 	char old_value_l_[PADL_(struct itimerspec32 *)]; struct itimerspec32 * old_value; char old_value_r_[PADR_(struct itimerspec32 *)];
 };
+struct freebsd32_setcred_args {
+	char flags_l_[PADL_(u_int)]; u_int flags; char flags_r_[PADR_(u_int)];
+	char wcred_l_[PADL_(const struct setcred32 *)]; const struct setcred32 * wcred; char wcred_r_[PADR_(const struct setcred32 *)];
+	char size_l_[PADL_(size_t)]; size_t size; char size_r_[PADR_(size_t)];
+};
 int	freebsd32_wait4(struct thread *, struct freebsd32_wait4_args *);
 int	freebsd32_ptrace(struct thread *, struct freebsd32_ptrace_args *);
 int	freebsd32_recvmsg(struct thread *, struct freebsd32_recvmsg_args *);
@@ -811,6 +816,7 @@ int	freebsd32_aio_writev(struct thread *, struct freebsd32_aio_writev_args *);
 int	freebsd32_aio_readv(struct thread *, struct freebsd32_aio_readv_args *);
 int	freebsd32_timerfd_gettime(struct thread *, struct freebsd32_timerfd_gettime_args *);
 int	freebsd32_timerfd_settime(struct thread *, struct freebsd32_timerfd_settime_args *);
+int	freebsd32_setcred(struct thread *, struct freebsd32_setcred_args *);
 
 #ifdef COMPAT_43
 
@@ -1312,6 +1318,7 @@ int	freebsd11_freebsd32_fstatat(struct thread *, struct freebsd11_freebsd32_fsta
 #define	FREEBSD32_SYS_AUE_freebsd32_aio_readv	AUE_AIO_READV
 #define	FREEBSD32_SYS_AUE_freebsd32_timerfd_gettime	AUE_TIMERFD
 #define	FREEBSD32_SYS_AUE_freebsd32_timerfd_settime	AUE_TIMERFD
+#define	FREEBSD32_SYS_AUE_freebsd32_setcred	AUE_SETCRED
 
 #undef PAD_
 #undef PADL_
diff --git a/sys/compat/freebsd32/freebsd32_syscall.h b/sys/compat/freebsd32/freebsd32_syscall.h
index a68154ad9c13..b01ea86551d9 100644
--- a/sys/compat/freebsd32/freebsd32_syscall.h
+++ b/sys/compat/freebsd32/freebsd32_syscall.h
@@ -509,4 +509,5 @@
 #define	FREEBSD32_SYS_kcmp	588
 #define	FREEBSD32_SYS_getrlimitusage	589
 #define	FREEBSD32_SYS_fchroot	590
-#define	FREEBSD32_SYS_MAXSYSCALL	591
+#define	FREEBSD32_SYS_freebsd32_setcred	591
+#define	FREEBSD32_SYS_MAXSYSCALL	592
diff --git a/sys/compat/freebsd32/freebsd32_syscalls.c b/sys/compat/freebsd32/freebsd32_syscalls.c
index daf2e217cf03..cf5d42eefb10 100644
--- a/sys/compat/freebsd32/freebsd32_syscalls.c
+++ b/sys/compat/freebsd32/freebsd32_syscalls.c
@@ -596,4 +596,5 @@ const char *freebsd32_syscallnames[] = {
 	"kcmp",			/* 588 = kcmp */
 	"getrlimitusage",			/* 589 = getrlimitusage */
 	"fchroot",			/* 590 = fchroot */
+	"freebsd32_setcred",			/* 591 = freebsd32_setcred */
 };
diff --git a/sys/compat/freebsd32/freebsd32_sysent.c b/sys/compat/freebsd32/freebsd32_sysent.c
index d7fe60a8c5f4..a54744d3b260 100644
--- a/sys/compat/freebsd32/freebsd32_sysent.c
+++ b/sys/compat/freebsd32/freebsd32_sysent.c
@@ -658,4 +658,5 @@ struct sysent freebsd32_sysent[] = {
 	{ .sy_narg = AS(kcmp_args), .sy_call = (sy_call_t *)sys_kcmp, .sy_auevent = AUE_NULL, .sy_flags = 0, .sy_thrcnt = SY_THR_STATIC },	/* 588 = kcmp */
 	{ .sy_narg = AS(getrlimitusage_args), .sy_call = (sy_call_t *)sys_getrlimitusage, .sy_auevent = AUE_NULL, .sy_flags = SYF_CAPENABLED, .sy_thrcnt = SY_THR_STATIC },	/* 589 = getrlimitusage */
 	{ .sy_narg = AS(fchroot_args), .sy_call = (sy_call_t *)sys_fchroot, .sy_auevent = AUE_NULL, .sy_flags = 0, .sy_thrcnt = SY_THR_STATIC },	/* 590 = fchroot */
+	{ .sy_narg = AS(freebsd32_setcred_args), .sy_call = (sy_call_t *)freebsd32_setcred, .sy_auevent = AUE_SETCRED, .sy_flags = SYF_CAPENABLED, .sy_thrcnt = SY_THR_STATIC },	/* 591 = freebsd32_setcred */
 };
diff --git a/sys/compat/freebsd32/freebsd32_systrace_args.c b/sys/compat/freebsd32/freebsd32_systrace_args.c
index dd82d0f44f6a..39b93074e5be 100644
--- a/sys/compat/freebsd32/freebsd32_systrace_args.c
+++ b/sys/compat/freebsd32/freebsd32_systrace_args.c
@@ -3385,6 +3385,15 @@ systrace_args(int sysnum, void *params, uint64_t *uarg, int *n_args)
 		*n_args = 1;
 		break;
 	}
+	/* freebsd32_setcred */
+	case 591: {
+		struct freebsd32_setcred_args *p = params;
+		uarg[a++] = p->flags; /* u_int */
+		uarg[a++] = (intptr_t)p->wcred; /* const struct setcred32 * */
+		uarg[a++] = p->size; /* size_t */
+		*n_args = 3;
+		break;
+	}
 	default:
 		*n_args = 0;
 		break;
@@ -9143,6 +9152,22 @@ systrace_entry_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 			break;
 		};
 		break;
+	/* freebsd32_setcred */
+	case 591:
+		switch (ndx) {
+		case 0:
+			p = "u_int";
+			break;
+		case 1:
+			p = "userland const struct setcred32 *";
+			break;
+		case 2:
+			p = "size_t";
+			break;
+		default:
+			break;
+		};
+		break;
 	default:
 		break;
 	};
@@ -11036,6 +11061,11 @@ systrace_return_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
+	/* freebsd32_setcred */
+	case 591:
+		if (ndx == 0 || ndx == 1)
+			p = "int";
+		break;
 	default:
 		break;
 	};
diff --git a/sys/kern/init_sysent.c b/sys/kern/init_sysent.c
index 21860f697940..30cf30b8ed29 100644
--- a/sys/kern/init_sysent.c
+++ b/sys/kern/init_sysent.c
@@ -657,4 +657,5 @@ struct sysent sysent[] = {
 	{ .sy_narg = AS(kcmp_args), .sy_call = (sy_call_t *)sys_kcmp, .sy_auevent = AUE_NULL, .sy_flags = 0, .sy_thrcnt = SY_THR_STATIC },	/* 588 = kcmp */
 	{ .sy_narg = AS(getrlimitusage_args), .sy_call = (sy_call_t *)sys_getrlimitusage, .sy_auevent = AUE_NULL, .sy_flags = SYF_CAPENABLED, .sy_thrcnt = SY_THR_STATIC },	/* 589 = getrlimitusage */
 	{ .sy_narg = AS(fchroot_args), .sy_call = (sy_call_t *)sys_fchroot, .sy_auevent = AUE_NULL, .sy_flags = 0, .sy_thrcnt = SY_THR_STATIC },	/* 590 = fchroot */
+	{ .sy_narg = AS(setcred_args), .sy_call = (sy_call_t *)sys_setcred, .sy_auevent = AUE_SETCRED, .sy_flags = SYF_CAPENABLED, .sy_thrcnt = SY_THR_STATIC },	/* 591 = setcred */
 };
diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c
index 80503570b776..d5651f671110 100644
--- a/sys/kern/kern_jail.c
+++ b/sys/kern/kern_jail.c
@@ -3955,6 +3955,7 @@ prison_priv_check(struct ucred *cred, int priv)
 		 * Allow jailed processes to manipulate process UNIX
 		 * credentials in any way they see fit.
 		 */
+	case PRIV_CRED_SETCRED:
 	case PRIV_CRED_SETUID:
 	case PRIV_CRED_SETEUID:
 	case PRIV_CRED_SETGID:
diff --git a/sys/kern/kern_prot.c b/sys/kern/kern_prot.c
index c51210a2b29b..8edbb7f18f1a 100644
--- a/sys/kern/kern_prot.c
+++ b/sys/kern/kern_prot.c
@@ -47,6 +47,7 @@
 
 #include <sys/param.h>
 #include <sys/systm.h>
+#include <sys/abi_compat.h>
 #include <sys/acct.h>
 #include <sys/kdb.h>
 #include <sys/kernel.h>
@@ -73,6 +74,10 @@
 #include <sys/syscallsubr.h>
 #include <sys/sysctl.h>
 
+#ifdef MAC
+#include <security/mac/mac_syscalls.h>
+#endif
+
 #include <vm/uma.h>
 
 #ifdef REGRESSION
@@ -484,6 +489,365 @@ done:
 	return (error);
 }
 
+static int
+gidp_cmp(const void *p1, const void *p2)
+{
+	const gid_t g1 = *(const gid_t *)p1;
+	const gid_t g2 = *(const gid_t *)p2;
+
+	return ((g1 > g2) - (g1 < g2));
+}
+
+/*
+ * Final storage for groups (including the effective GID) will be returned via
+ * 'groups'.  '*groups' must be NULL on input, and if not equal to 'smallgroups'
+ * on output, must be freed (M_TEMP) *even if* an error is returned.
+ */
+static int
+kern_setcred_copyin_supp_groups(struct setcred *const wcred,
+    const u_int flags, gid_t *const smallgroups, gid_t **const groups)
+{
+	MPASS(*groups == NULL);
+
+	if (flags & SETCREDF_SUPP_GROUPS) {
+		int error;
+
+		/*
+		 * Check for the limit for number of groups right now in order
+		 * to limit the amount of bytes to copy.
+		 */
+		if (wcred->sc_supp_groups_nb > ngroups_max)
+			return (EINVAL);
+
+		/*
+		 * Since we are going to be copying the supplementary groups
+		 * from userland, make room also for the effective GID right
+		 * now, to avoid having to allocate and copy again the
+		 * supplementary groups.
+		 */
+		*groups = wcred->sc_supp_groups_nb < CRED_SMALLGROUPS_NB ?
+		    smallgroups : malloc((wcred->sc_supp_groups_nb + 1) *
+		    sizeof(*groups), M_TEMP, M_WAITOK);
+
+		error = copyin(wcred->sc_supp_groups, *groups + 1,
+		    wcred->sc_supp_groups_nb * sizeof(*groups));
+		if (error != 0)
+			return (error);
+		wcred->sc_supp_groups = *groups + 1;
+	} else {
+		wcred->sc_supp_groups_nb = 0;
+		wcred->sc_supp_groups = NULL;
+	}
+
+	return (0);
+}
+
+int
+user_setcred(struct thread *td, const u_int flags,
+    const void *const uwcred, const size_t size, bool is_32bit)
+{
+	struct setcred wcred;
+#ifdef MAC
+	struct mac mac;
+	/* Pointer to 'struct mac' or 'struct mac32'. */
+	void *umac;
+#endif
+	gid_t smallgroups[CRED_SMALLGROUPS_NB];
+	gid_t *groups = NULL;
+	int error;
+
+	/*
+	 * As the only point of this wrapper function is to copyin() from
+	 * userland, we only interpret the data pieces we need to perform this
+	 * operation and defer further sanity checks to kern_setcred(), except
+	 * that we redundantly check here that no unknown flags have been
+	 * passed.
+	 */
+	if ((flags & ~SETCREDF_MASK) != 0)
+		return (EINVAL);
+
+#ifdef COMPAT_FREEBSD32
+	if (is_32bit) {
+		struct setcred32 wcred32;
+
+		if (size != sizeof(wcred32))
+			return (EINVAL);
+		error = copyin(uwcred, &wcred32, sizeof(wcred32));
+		if (error != 0)
+			return (error);
+		/* These fields have exactly the same sizes and positions. */
+		memcpy(&wcred, &wcred32, &wcred32.setcred32_copy_end -
+		    &wcred32.setcred32_copy_start);
+		/* Remaining fields are pointers and need PTRIN*(). */
+		PTRIN_CP(wcred32, wcred, sc_supp_groups);
+		PTRIN_CP(wcred32, wcred, sc_label);
+	} else
+#endif /* COMPAT_FREEBSD32 */
+	{
+		if (size != sizeof(wcred))
+			return (EINVAL);
+		error = copyin(uwcred, &wcred, sizeof(wcred));
+		if (error != 0)
+			return (error);
+	}
+#ifdef MAC
+	umac = wcred.sc_label;
+#endif
+	/* Also done on !MAC as a defensive measure. */
+	wcred.sc_label = NULL;
+
+	/*
+	 * Copy supplementary groups as needed.  There is no specific
+	 * alternative for 32-bit compatibility as 'gid_t' has the same size
+	 * everywhere.
+	 */
+	error = kern_setcred_copyin_supp_groups(&wcred, flags, smallgroups,
+	    &groups);
+	if (error != 0)
+		goto free_groups;
+
+#ifdef MAC
+	if ((flags & SETCREDF_MAC_LABEL) != 0) {
+#ifdef COMPAT_FREEBSD32
+		if (is_32bit)
+			error = mac_label_copyin32(umac, &mac, NULL);
+		else
+#endif
+			error = mac_label_copyin(umac, &mac, NULL);
+		if (error != 0)
+			goto free_groups;
+		wcred.sc_label = &mac;
+	}
+#endif
+
+	error = kern_setcred(td, flags, &wcred, groups);
+
+#ifdef MAC
+	if (wcred.sc_label != NULL)
+		free_copied_label(wcred.sc_label);
+#endif
+
+free_groups:
+	if (groups != smallgroups)
+		free(groups, M_TEMP);
+
+	return (error);
+}
+
+#ifndef _SYS_SYSPROTO_H_
+struct setcred_args {
+	u_int			 flags;	/* Flags. */
+	const struct setcred	*wcred;
+	size_t			 size;	/* Passed 'setcred' structure length. */
+};
+#endif
+/* ARGSUSED */
+int
+sys_setcred(struct thread *td, struct setcred_args *uap)
+{
+	return (user_setcred(td, uap->flags, uap->wcred, uap->size, false));
+}
+
+/*
+ * CAUTION: This function normalizes groups in 'wcred'.
+ *
+ * If 'preallocated_groups' is non-NULL, it must be an already allocated array
+ * of size 'wcred->sc_supp_groups_nb + 1', with the supplementary groups
+ * starting at index 1, and 'wcred->sc_supp_groups' then must point to the first
+ * supplementary group.
+ */
+int
+kern_setcred(struct thread *const td, const u_int flags,
+    struct setcred *const wcred, gid_t *preallocated_groups)
+{
+	struct proc *const p = td->td_proc;
+	struct ucred *new_cred, *old_cred, *to_free_cred;
+	struct uidinfo *uip = NULL, *ruip = NULL;
+#ifdef MAC
+	void *mac_set_proc_data = NULL;
+	bool proc_label_set = false;
+#endif
+	gid_t *groups = NULL;
+	gid_t smallgroups[CRED_SMALLGROUPS_NB];
+	int error;
+	bool cred_set;
+
+	/* Bail out on unrecognized flags. */
+	if (flags & ~SETCREDF_MASK)
+		return (EINVAL);
+
+	/*
+	 * Part 1: We allocate and perform preparatory operations with no locks.
+	 */
+
+	if (flags & SETCREDF_SUPP_GROUPS) {
+		if (wcred->sc_supp_groups_nb > ngroups_max)
+			return (EINVAL);
+		if (preallocated_groups != NULL) {
+			groups = preallocated_groups;
+			MPASS(preallocated_groups + 1 == wcred->sc_supp_groups);
+		} else {
+			groups = wcred->sc_supp_groups_nb < CRED_SMALLGROUPS_NB ?
+			    smallgroups :
+			    malloc((wcred->sc_supp_groups_nb + 1) *
+			    sizeof(*groups), M_TEMP, M_WAITOK);
+			memcpy(groups + 1, wcred->sc_supp_groups,
+			    wcred->sc_supp_groups_nb * sizeof(*groups));
+		}
+	}
+
+	if (flags & SETCREDF_MAC_LABEL) {
+#ifdef MAC
+		error = mac_set_proc_prepare(td, wcred->sc_label,
+		    &mac_set_proc_data);
+		if (error != 0)
+			goto free_groups;
+#else
+		error = ENOTSUP;
+		goto free_groups;
+#endif
+	}
+
+	if (flags & SETCREDF_UID) {
+		AUDIT_ARG_EUID(wcred->sc_uid);
+		uip = uifind(wcred->sc_uid);
+	}
+	if (flags & SETCREDF_RUID) {
+		AUDIT_ARG_RUID(wcred->sc_ruid);
+		ruip = uifind(wcred->sc_ruid);
+	}
+	if (flags & SETCREDF_SVUID)
+		AUDIT_ARG_SUID(wcred->sc_svuid);
+
+	if (flags & SETCREDF_GID)
+		AUDIT_ARG_EGID(wcred->sc_gid);
+	if (flags & SETCREDF_RGID)
+		AUDIT_ARG_RGID(wcred->sc_rgid);
+	if (flags & SETCREDF_SVGID)
+		AUDIT_ARG_SGID(wcred->sc_svgid);
+	if (flags & SETCREDF_SUPP_GROUPS) {
+		int ngrp = wcred->sc_supp_groups_nb;
+
+		/*
+		 * Output the raw supplementary groups array for better
+		 * traceability.
+		 */
+		AUDIT_ARG_GROUPSET(groups + 1, ngrp);
+		++ngrp;
+		groups_normalize(&ngrp, groups);
+		wcred->sc_supp_groups_nb = ngrp - 1;
+	}
+
+	/*
+	 * We first completely build the new credentials and only then pass them
+	 * to MAC along with the old ones so that modules can check whether the
+	 * requested transition is allowed.
+	 */
+	new_cred = crget();
+	to_free_cred = new_cred;
+	if (flags & SETCREDF_SUPP_GROUPS)
+		crextend(new_cred, wcred->sc_supp_groups_nb + 1);
+
+#ifdef MAC
+	mac_cred_setcred_enter();
+#endif
+
+	/*
+	 * Part 2: We grab the process lock as to have a stable view of its
+	 * current credentials, and prepare a copy of them with the requested
+	 * changes applied under that lock.
+	 */
+
+	PROC_LOCK(p);
+	old_cred = crcopysafe(p, new_cred);
+
+	/*
+	 * Change user IDs.
+	 */
+	if (flags & SETCREDF_UID)
+		change_euid(new_cred, uip);
+	if (flags & SETCREDF_RUID)
+		change_ruid(new_cred, ruip);
+	if (flags & SETCREDF_SVUID)
+		change_svuid(new_cred, wcred->sc_svuid);
+
+	/*
+	 * Change groups.
+	 *
+	 * crsetgroups_internal() changes both the effective and supplementary
+	 * ones.
+	 */
+	if (flags & SETCREDF_SUPP_GROUPS) {
+		groups[0] = flags & SETCREDF_GID ? wcred->sc_gid :
+		    new_cred->cr_gid;
+		crsetgroups_internal(new_cred, wcred->sc_supp_groups_nb + 1,
+		    groups);
+	} else if (flags & SETCREDF_GID)
+		change_egid(new_cred, wcred->sc_gid);
+	if (flags & SETCREDF_RGID)
+		change_rgid(new_cred, wcred->sc_rgid);
+	if (flags & SETCREDF_SVGID)
+		change_svgid(new_cred, wcred->sc_svgid);
+
+#ifdef MAC
+	/*
+	 * Change the MAC label.
+	 */
+	if (flags & SETCREDF_MAC_LABEL) {
+		error = mac_set_proc_core(td, new_cred, mac_set_proc_data);
+		if (error != 0)
+			goto unlock_finish;
+		proc_label_set = true;
+	}
+
+	/*
+	 * MAC security modules checks.
+	 */
+	error = mac_cred_check_setcred(flags, old_cred, new_cred);
+	if (error != 0)
+		goto unlock_finish;
+#endif
+	/*
+	 * Privilege check.
+	 */
+	error = priv_check_cred(old_cred, PRIV_CRED_SETCRED);
+	if (error != 0)
+		goto unlock_finish;
+
+	/*
+	 * Set the new credentials, noting that they have changed.
+	 */
+	cred_set = proc_set_cred_enforce_proc_lim(p, new_cred);
+	if (cred_set) {
+		setsugid(p);
+		to_free_cred = old_cred;
+		MPASS(error == 0);
+	} else
+		error = EAGAIN;
+
+unlock_finish:
+	PROC_UNLOCK(p);
+	/*
+	 * Part 3: After releasing the process lock, we perform cleanups and
+	 * finishing operations.
+	 */
+
+#ifdef MAC
+	if (mac_set_proc_data != NULL)
+		mac_set_proc_finish(td, proc_label_set, mac_set_proc_data);
+	mac_cred_setcred_exit();
+#endif
+	crfree(to_free_cred);
+	if (uip != NULL)
+		uifree(uip);
+	if (ruip != NULL)
+		uifree(ruip);
+free_groups:
+	if (groups != preallocated_groups && groups != smallgroups)
+		free(groups, M_TEMP); /* Deals with 'groups' being NULL. */
+	return (error);
+}
+
 /*
  * Use the clause in B.4.2.2 that allows setuid/setgid to be 4.2/4.3BSD
  * compatible.  It says that setting the uid/gid to euid/egid is a special
@@ -859,15 +1223,6 @@ sys_setgroups(struct thread *td, struct setgroups_args *uap)
 	return (error);
 }
 
-static int
-gidp_cmp(const void *p1, const void *p2)
-{
-	const gid_t g1 = *(const gid_t *)p1;
-	const gid_t g2 = *(const gid_t *)p2;
-
-	return ((g1 > g2) - (g1 < g2));
-}
-
 /*
  * CAUTION: This function normalizes 'groups', possibly also changing the value
  * of '*ngrpp' as a consequence.
diff --git a/sys/kern/syscalls.c b/sys/kern/syscalls.c
index 414edab93e33..142350ade770 100644
--- a/sys/kern/syscalls.c
+++ b/sys/kern/syscalls.c
@@ -596,4 +596,5 @@ const char *syscallnames[] = {
 	"kcmp",			/* 588 = kcmp */
 	"getrlimitusage",			/* 589 = getrlimitusage */
 	"fchroot",			/* 590 = fchroot */
+	"setcred",			/* 591 = setcred */
 };
diff --git a/sys/kern/syscalls.master b/sys/kern/syscalls.master
index e7f577d48426..d3c4f2c64231 100644
--- a/sys/kern/syscalls.master
+++ b/sys/kern/syscalls.master
@@ -3346,5 +3346,12 @@
 		    int fd
 		);
 	}
+591	AUE_SETCRED	STD|CAPENABLED {
+		int setcred(
+		    u_int flags,
+		    _In_reads_bytes_(size) _Contains_ptr_ const struct setcred *wcred,
+		    size_t size
+		);
+	}
 
 ; vim: syntax=off
diff --git a/sys/kern/systrace_args.c b/sys/kern/systrace_args.c
index 63c26f605e88..2b4be1065425 100644
--- a/sys/kern/systrace_args.c
+++ b/sys/kern/systrace_args.c
@@ -3472,6 +3472,15 @@ systrace_args(int sysnum, void *params, uint64_t *uarg, int *n_args)
 		*n_args = 1;
 		break;
 	}
+	/* setcred */
+	case 591: {
+		struct setcred_args *p = params;
+		uarg[a++] = p->flags; /* u_int */
+		uarg[a++] = (intptr_t)p->wcred; /* const struct setcred * */
+		uarg[a++] = p->size; /* size_t */
+		*n_args = 3;
+		break;
+	}
 	default:
 		*n_args = 0;
 		break;
@@ -9288,6 +9297,22 @@ systrace_entry_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 			break;
 		};
 		break;
+	/* setcred */
+	case 591:
+		switch (ndx) {
+		case 0:
+			p = "u_int";
+			break;
+		case 1:
+			p = "userland const struct setcred *";
+			break;
+		case 2:
+			p = "size_t";
+			break;
+		default:
+			break;
+		};
+		break;
 	default:
 		break;
 	};
@@ -11271,6 +11296,11 @@ systrace_return_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)
 		if (ndx == 0 || ndx == 1)
 			p = "int";
 		break;
+	/* setcred */
+	case 591:
+		if (ndx == 0 || ndx == 1)
+			p = "int";
+		break;
 	default:
 		break;
 	};
diff --git a/sys/security/mac/mac_cred.c b/sys/security/mac/mac_cred.c
index 304265b783f1..5066de277176 100644
--- a/sys/security/mac/mac_cred.c
+++ b/sys/security/mac/mac_cred.c
@@ -209,6 +209,53 @@ mac_cred_check_relabel(struct ucred *cred, struct label *newlabel)
 	return (error);
 }
 
+/*
+ * Entry hook for setcred().
+ *
+ * Called with no lock held by setcred() so that MAC modules may allocate memory
+ * in preparation for checking privileges.  A call to this hook is always
+ * followed by a matching call to mac_cred_setcred_exit().  Between these two,
+ * setcred() may or may not call mac_cred_check_setcred().
+ */
+void
+mac_cred_setcred_enter(void)
+{
+	MAC_POLICY_PERFORM_NOSLEEP(cred_setcred_enter);
+}
+
+MAC_CHECK_PROBE_DEFINE3(cred_check_setcred, "unsigned int", "struct ucred *",
+    "struct ucred *");
+
+/*
+ * Check hook for setcred().
+ *
+ * When called, the current process' lock is held.  It thus cannot perform
+ * memory allocations, which must be done in advance in
+ * mac_cred_setcred_enter().  It *MUST NOT* tamper with the process' lock.
+ */
+int
+mac_cred_check_setcred(u_int flags, const struct ucred *old_cred,
+    struct ucred *new_cred)
+{
+	int error;
+
+	MAC_POLICY_CHECK_NOSLEEP(cred_check_setcred, flags, old_cred, new_cred);
+	MAC_CHECK_PROBE3(cred_check_setcred, error, flags, old_cred, new_cred);
+
+	return (error);
+}
+
+/*
+ * Exit hook for setcred().
+ *
+ * Called with no lock held, exactly once per call to mac_cred_setcred_enter().
+ */
+void
+mac_cred_setcred_exit(void)
+{
+	MAC_POLICY_PERFORM_NOSLEEP(cred_setcred_exit);
+}
+
 MAC_CHECK_PROBE_DEFINE2(cred_check_setuid, "struct ucred *", "uid_t");
 
 int
diff --git a/sys/security/mac/mac_framework.h b/sys/security/mac/mac_framework.h
index c69b9cd64454..8e43f267f368 100644
--- a/sys/security/mac/mac_framework.h
+++ b/sys/security/mac/mac_framework.h
@@ -72,6 +72,7 @@ struct mbuf;
 struct mount;
 struct msg;
 struct msqid_kernel;
+struct pipepair;
 struct proc;
 struct semid_kernel;
 struct shmfd;
@@ -80,7 +81,6 @@ struct sockaddr;
 struct socket;
 struct sysctl_oid;
 struct sysctl_req;
-struct pipepair;
 struct thread;
 struct timespec;
 struct ucred;
@@ -115,6 +115,10 @@ int	mac_cred_check_setaudit(struct ucred *cred, struct auditinfo *ai);
 int	mac_cred_check_setaudit_addr(struct ucred *cred,
 	    struct auditinfo_addr *aia);
 int	mac_cred_check_setauid(struct ucred *cred, uid_t auid);
+void	mac_cred_setcred_enter(void);
+int	mac_cred_check_setcred(u_int flags, const struct ucred *old_cred,
+	    struct ucred *new_cred);
+void	mac_cred_setcred_exit(void);
 int	mac_cred_check_setegid(struct ucred *cred, gid_t egid);
 int	mac_cred_check_seteuid(struct ucred *cred, uid_t euid);
 int	mac_cred_check_setgid(struct ucred *cred, gid_t gid);
diff --git a/sys/security/mac/mac_policy.h b/sys/security/mac/mac_policy.h
index 084684e57497..66e489060804 100644
--- a/sys/security/mac/mac_policy.h
+++ b/sys/security/mac/mac_policy.h
@@ -144,6 +144,10 @@ typedef int	(*mpo_cred_check_setaudit_t)(struct ucred *cred,
 typedef int	(*mpo_cred_check_setaudit_addr_t)(struct ucred *cred,
 		    struct auditinfo_addr *aia);
 typedef int	(*mpo_cred_check_setauid_t)(struct ucred *cred, uid_t auid);
+typedef void	(*mpo_cred_setcred_enter_t)(void);
+typedef int	(*mpo_cred_check_setcred_t)(u_int flags,
+		    const struct ucred *old_cred, struct ucred *new_cred);
+typedef void	(*mpo_cred_setcred_exit_t)(void);
 typedef int	(*mpo_cred_check_setegid_t)(struct ucred *cred, gid_t egid);
 typedef int	(*mpo_cred_check_seteuid_t)(struct ucred *cred, uid_t euid);
 typedef int	(*mpo_cred_check_setgid_t)(struct ucred *cred, gid_t gid);
@@ -720,6 +724,9 @@ struct mac_policy_ops {
 	mpo_cred_check_setaudit_t		mpo_cred_check_setaudit;
 	mpo_cred_check_setaudit_addr_t		mpo_cred_check_setaudit_addr;
 	mpo_cred_check_setauid_t		mpo_cred_check_setauid;
+	mpo_cred_setcred_enter_t		mpo_cred_setcred_enter;
+	mpo_cred_check_setcred_t		mpo_cred_check_setcred;
+	mpo_cred_setcred_exit_t			mpo_cred_setcred_exit;
 	mpo_cred_check_setuid_t			mpo_cred_check_setuid;
 	mpo_cred_check_seteuid_t		mpo_cred_check_seteuid;
 	mpo_cred_check_setgid_t			mpo_cred_check_setgid;
@@ -1033,8 +1040,9 @@ struct mac_policy_conf {
  *   3                       7.x
  *   4                       8.x
  *   5                       14.x
+ *   6                       15.x
  */
-#define	MAC_VERSION	5
+#define	MAC_VERSION	6
 
 #define	MAC_POLICY_SET(mpops, mpname, mpfullname, mpflags, privdata_wanted) \
 	static struct mac_policy_conf mpname##_mac_policy_conf = {	\
diff --git a/sys/security/mac_stub/mac_stub.c b/sys/security/mac_stub/mac_stub.c
index c602c639ec95..a3b0dd01a76b 100644
--- a/sys/security/mac_stub/mac_stub.c
+++ b/sys/security/mac_stub/mac_stub.c
@@ -222,6 +222,23 @@ stub_cred_check_setauid(struct ucred *cred, uid_t auid)
 	return (0);
 }
 
+static void
+stub_cred_setcred_enter(void)
+{
+}
+
+static int
+stub_cred_check_setcred(u_int flags, const struct ucred *old_cred,
+    struct ucred *new_cred)
+{
+	return (0);
+}
+
+static void
+stub_cred_setcred_exit(void)
+{
+}
+
 static int
 stub_cred_check_setegid(struct ucred *cred, gid_t egid)
 {
@@ -1688,6 +1705,9 @@ static struct mac_policy_ops stub_ops =
 	.mpo_cred_check_setaudit = stub_cred_check_setaudit,
 	.mpo_cred_check_setaudit_addr = stub_cred_check_setaudit_addr,
 	.mpo_cred_check_setauid = stub_cred_check_setauid,
+	.mpo_cred_setcred_enter = stub_cred_setcred_enter,
+	.mpo_cred_check_setcred = stub_cred_check_setcred,
+	.mpo_cred_setcred_exit = stub_cred_setcred_exit,
 	.mpo_cred_check_setegid = stub_cred_check_setegid,
 	.mpo_cred_check_seteuid = stub_cred_check_seteuid,
 	.mpo_cred_check_setgid = stub_cred_check_setgid,
diff --git a/sys/security/mac_test/mac_test.c b/sys/security/mac_test/mac_test.c
index 7a6a76ce23cc..890b8328055e 100644
--- a/sys/security/mac_test/mac_test.c
+++ b/sys/security/mac_test/mac_test.c
@@ -257,6 +257,32 @@ test_cred_check_setauid(struct ucred *cred, uid_t auid)
*** 245 LINES SKIPPED ***