Re: git: a4e4132fa3bf - main - swapoff(2): replace special device name argument with a structure

From: Konstantin Belousov <kostikbel_at_gmail.com>
Date: Sun, 05 Dec 2021 18:51:57 UTC
On Sun, Dec 05, 2021 at 05:14:54PM +0000, Jessica Clarke wrote:
> On 5 Dec 2021, at 13:22, Konstantin Belousov <kostikbel@gmail.com> wrote:
> > 
> > On Sun, Dec 05, 2021 at 03:03:26AM +0000, Jessica Clarke wrote:
> >> On 4 Dec 2021, at 22:21, Konstantin Belousov <kib@FreeBSD.org> wrote:
> >>> 
> >>> The branch main has been updated by kib:
> >>> 
> >>> URL: https://cgit.FreeBSD.org/src/commit/?id=a4e4132fa3bfadb6047fc0fa5f399f4640460300
> >>> 
> >>> commit a4e4132fa3bfadb6047fc0fa5f399f4640460300
> >>> Author:     Konstantin Belousov <kib@FreeBSD.org>
> >>> AuthorDate: 2021-11-29 16:26:31 +0000
> >>> Commit:     Konstantin Belousov <kib@FreeBSD.org>
> >>> CommitDate: 2021-12-04 22:20:58 +0000
> >>> 
> >>>   swapoff(2): replace special device name argument with a structure
> >>> 
> >>>   For compatibility, add a placeholder pointer to the start of the
> >>>   added struct swapoff_new_args, and use it to distinguish old vs. new
> >>>   style of syscall invocation.
> >>> 
> >>>   Reviewed by:    markj
> >>>   Discussed with: alc
> >>>   Sponsored by:   The FreeBSD Foundation
> >>>   MFC after:      1 week
> >>>   Differential revision:  https://reviews.freebsd.org/D33165
> >>> ---
> >>> sys/vm/swap_pager.c | 27 +++++++++++++++++++++++++--
> >>> sys/vm/swap_pager.h |  8 ++++++++
> >>> 2 files changed, 33 insertions(+), 2 deletions(-)
> >>> 
> >>> diff --git a/sys/vm/swap_pager.c b/sys/vm/swap_pager.c
> >>> index 165373d1b527..dc1df79f4fcd 100644
> >>> --- a/sys/vm/swap_pager.c
> >>> +++ b/sys/vm/swap_pager.c
> >>> @@ -2491,15 +2491,38 @@ sys_swapoff(struct thread *td, struct swapoff_args *uap)
> >>> 	struct vnode *vp;
> >>> 	struct nameidata nd;
> >>> 	struct swdevt *sp;
> >>> -	int error;
> >>> +	struct swapoff_new_args sa;
> >>> +	int error, probe_byte;
> >>> 
> >>> 	error = priv_check(td, PRIV_SWAPOFF);
> >>> 	if (error)
> >>> 		return (error);
> >>> 
> >>> +	/*
> >>> +	 * Detect old vs. new-style swapoff(2) syscall.  The first
> >>> +	 * pointer in the memory pointed to by uap->name is NULL for
> >>> +	 * the new variant.
> >>> +	 */
> >>> +	probe_byte = fubyte(uap->name);
> >>> +	switch (probe_byte) {
> >>> +	case -1:
> >>> +		return (EFAULT);
> >>> +	case 0:
> >>> +		error = copyin(uap->name, &sa, sizeof(sa));
> >>> +		if (error != 0)
> >>> +			return (error);
> >>> +		if (sa.flags != 0)
> >>> +			return (EINVAL);
> >>> +		break;
> >>> +	default:
> >>> +		bzero(&sa, sizeof(sa));
> >>> +		sa.name = uap->name;
> >>> +		break;
> >>> +	}
> >> 
> >> Doesn’t this change the semantics of swapoff("")?
> >> 
> >> Previously it would fail deterministically, presumably with ENOENT or
> >> something, but now it reinterprets whatever follows that string in
> >> memory as the new argument structure. It probably doesn’t matter, but
> >> this approach is ugly. Can we not just define a new syscall rather than
> >> this kind of bodge?
> > 
> > Having two swapoff() syscalls is worse, and having them only differ in
> > semantic by single flag is kind of crime.
> > 
> > I do not see swapoff("") as problematic, we are changing a minor semantic of
> > the management syscall.  I only wanted to avoid flag day for swapoff binaries.
> > 
> > BTW, I considered requiring proper alignment for uap->name, and then checking
> > the whole uap->name_old_syscall for NULL, but then decided that this is
> > overkill.  If you think that swapoff("") that important, I can add that
> > additional verification.
> 
> Why’s it worse? It’s just a syscall number, you deprecate the old one
> and move on, we do that for things relatively regularly. This is really
> not a good solution; harder to use as a caller since the prototype is
> wrong, impossible to ensure you preserve the semantics for the existing
> interface in all cases, and ugly to implement. You don’t need a flag
> day for a new syscall, either, you can continue to only use the new
> method for -f for a release and then switch over to the new syscall
> entirely. Or switch over to the new syscall entirely now and fall back
> on the old syscall if -f isn’t passed. Defining a new syscall also lets
> you not need the name_old_syscall member in the struct, and gives you a
> clean, fully-extensible syscall to which future features can be added
> in a backwards-compatible way, rather than forever keeping around this
> legacy mess.

I disagree, it is not just a syscall number, it is whole user/kernel
interface that bloats, which means cognitive efforts from anybody using
this interfaces, and for which we must maintain ABI compatibility.

New syscall allocation should be done only as a last resort, when existing
interfaces cannot be adopted for new functionality.

Good (or rather, bad) example of the uglyness that is backed by the attitude
that syscalls are free, is whole *at() mess, or specific stat*() mess (old,
other bsds, pre-ino64, ino64, at, then stat vs fstat, then Linux statx which
probably fixes the interface ultimately).