From nobody Thu Aug 17 19:58:39 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RRbRN47lcz4q4rL; Thu, 17 Aug 2023 19:58:40 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RRbRN0xlmz4Hcb; Thu, 17 Aug 2023 19:58:40 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1692302320; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=NIY23sgL37WOTCGg5OOb/jp1HG917SbVd1NiY4hPWqU=; b=coik099CZ2OWW/PxIXTmLCFNDwi6FlUagJmNiqenMCmZbpnVJBqYKiT8SD7deTbOpW9/dx 8CztoR1FoJPipDG6dSyXbKf2HG0X58IGBS+cxWaxHgGsU1N1w9WHfVlqJHdCokLOvNpt9T WbwS0kvOuhPHvxE6s9hnQzY8NEvoBKN8+igu4Ejo/9nzF8CdFj/p3BqDz2dz80LRdeIeYW 7JTG3bH6PcTfU+sa9up+ptGjaPboNpoA/5spTLqkBpMrmU7Xyf8ZDeGqafSsvig0X2VQHY RAzsXlKLP0ZTnsynI83qbPghDpcEEZE+ZdfftC+ZC4QULaWBtaycUgI5mdBdFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1692302320; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=NIY23sgL37WOTCGg5OOb/jp1HG917SbVd1NiY4hPWqU=; b=C8KIRacdMS2w6n9NvzQiDLGrtPeCiwc5eVREtgYh1Wx3taETIwNKGYogsuSkPg89WA4KyR iaW2xGumiIIwErkfZJBrxkD4YXSHADZsJ12suzvoimOzHdTcpTGu/uWkpMeh2TWOZ4amU+ PXF4MXl1eUNbLNkrdBcv34wJ7mT3vBh+D0OhYRTpDYoumwlyQ8Ct2x52tQ/OjsuL93UlYM IKwhjsGy/3v/8mcpY4CNSwGPyoNYmIu74eG9OGCRssVGoa1vK4YKUinWz6/9n+YyfrfcLY pHMMq4QP+r+cx7nTtvIUkyd6jK+zaSHDz3r8acYg6waSagjReNb3bl26+nuQZQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1692302320; a=rsa-sha256; cv=none; b=Mqo5fv/uKrnZK1fu0J5gINJ6RE7RtfAY9Du+AtpD5xeCbcc4VnRivMzfx/SPPac1hkih9Q mSdP+BiMJvl6ALwc6amhXLl35+6Ll2fjPExotFS2nnotiwzl5D3cDYJD6D9tUJXwMvL33i 5AWeTsllfV1sxzVHz4/TqymuLZcM8waHYi5G24EWlr3ojp8/iJap0y2n3/SdCbxZDzTvDh fFcOva0bMmRkRTaZNlcHWgIMK90/lH/tlF8KFYzJNTdiKW6qCk5rzDOPHcZXUNu/N95YsO h6WNsciV/ds5z86F+sTu9XFKH5rHYXQakxN7HpMk2gVt3JkeRzG2fdM72g5t5Q== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4RRbRM73Kvzmtw; Thu, 17 Aug 2023 19:58:39 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.17.1/8.17.1) with ESMTP id 37HJwdkI080430; Thu, 17 Aug 2023 19:58:39 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.17.1/8.17.1/Submit) id 37HJwdLn080427; Thu, 17 Aug 2023 19:58:39 GMT (envelope-from git) Date: Thu, 17 Aug 2023 19:58:39 GMT Message-Id: <202308171958.37HJwdLn080427@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Dmitry Chagin Subject: git: bb66c5975383 - main - linux(4): Add sendfile fallback for non-socket fds List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: dchagin X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: bb66c59753836cd8abb596fe316dcdb77ea66999 Auto-Submitted: auto-generated The branch main has been updated by dchagin: URL: https://cgit.FreeBSD.org/src/commit/?id=bb66c59753836cd8abb596fe316dcdb77ea66999 commit bb66c59753836cd8abb596fe316dcdb77ea66999 Author: James McLaughlin AuthorDate: 2023-08-17 19:57:17 +0000 Commit: Dmitry Chagin CommitDate: 2023-08-17 19:57:17 +0000 linux(4): Add sendfile fallback for non-socket fds Before Linux 2.6.33, out_fd must refer to a socket. Since Linux 2.6.33 it can be any file. The patch was originally provided by James McLaughlin and adapted by me for copy_file_range. PR: 262535 Differential revision: https://reviews.freebsd.org/D34555 MFC after: 1 month --- sys/compat/linux/linux_socket.c | 209 ++++++++++++++++++++++++++++++++++------ 1 file changed, 177 insertions(+), 32 deletions(-) diff --git a/sys/compat/linux/linux_socket.c b/sys/compat/linux/linux_socket.c index 45b94cb2f994..f768392be546 100644 --- a/sys/compat/linux/linux_socket.c +++ b/sys/compat/linux/linux_socket.c @@ -36,10 +36,12 @@ #include #include #include +#include #include #include #include #include +#include #include #include @@ -2374,57 +2376,200 @@ out: return (error); } +/* + * Based on sendfile_getsock from kern_sendfile.c + * Determines whether an fd is a stream socket that can be used + * with FreeBSD sendfile. + */ +static bool +is_stream_socket(struct file *fp) +{ + struct socket *so; + + /* + * The socket must be a stream socket and connected. + */ + if (fp->f_type != DTYPE_SOCKET) + return (false); + so = fp->f_data; + if (so->so_type != SOCK_STREAM) + return (false); + /* + * SCTP one-to-one style sockets currently don't work with + * sendfile(). + */ + if (so->so_proto->pr_protocol == IPPROTO_SCTP) + return (false); + return (!SOLISTENING(so)); +} + +static bool +is_regular_file(struct file *fp) +{ + + return (fp->f_type == DTYPE_VNODE && fp->f_vnode != NULL && + fp->f_vnode->v_type == VREG); +} + static int -linux_sendfile_common(struct thread *td, l_int out, l_int in, - off_t *offset, l_size_t count) +sendfile_fallback(struct thread *td, struct file *fp, l_int out, + off_t *offset, l_size_t count, off_t *sbytes) { - off_t bytes_read; - int error; - l_loff_t current_offset; - struct file *fp; + off_t current_offset, out_offset, to_send; + l_size_t bytes_sent, n_read; + struct file *ofp; + struct iovec aiov; + struct uio auio; + bool seekable; + size_t bufsz; + void *buf; + int flags, error; - AUDIT_ARG_FD(in); - error = fget_read(td, in, &cap_pread_rights, &fp); + if (offset == NULL) { + if ((error = fo_seek(fp, 0, SEEK_CUR, td)) != 0) + return (error); + current_offset = td->td_uretoff.tdu_off; + } else { + if ((fp->f_ops->fo_flags & DFLAG_SEEKABLE) == 0) + return (ESPIPE); + current_offset = *offset; + } + error = fget_write(td, out, &cap_pwrite_rights, &ofp); if (error != 0) return (error); - - if (offset != NULL) { - current_offset = *offset; - } else { - error = (fp->f_ops->fo_flags & DFLAG_SEEKABLE) != 0 ? - fo_seek(fp, 0, SEEK_CUR, td) : ESPIPE; - if (error != 0) + seekable = (ofp->f_ops->fo_flags & DFLAG_SEEKABLE) != 0; + if (seekable) { + if ((error = fo_seek(ofp, 0, SEEK_CUR, td)) != 0) goto drop; + out_offset = td->td_uretoff.tdu_off; + } else + out_offset = 0; + + flags = FOF_OFFSET | FOF_NOUPDATE; + bufsz = min(count, MAXPHYS); + buf = malloc(bufsz, M_LINUX, M_WAITOK); + bytes_sent = 0; + while (bytes_sent < count) { + to_send = min(count - bytes_sent, bufsz); + aiov.iov_base = buf; + aiov.iov_len = bufsz; + auio.uio_iov = &aiov; + auio.uio_iovcnt = 1; + auio.uio_segflg = UIO_SYSSPACE; + auio.uio_td = td; + auio.uio_rw = UIO_READ; + auio.uio_offset = current_offset; + auio.uio_resid = to_send; + error = fo_read(fp, &auio, fp->f_cred, flags, td); + if (error != 0) + break; + n_read = to_send - auio.uio_resid; + if (n_read == 0) + break; + aiov.iov_base = buf; + aiov.iov_len = bufsz; + auio.uio_iov = &aiov; + auio.uio_iovcnt = 1; + auio.uio_segflg = UIO_SYSSPACE; + auio.uio_td = td; + auio.uio_rw = UIO_WRITE; + auio.uio_offset = (seekable) ? out_offset : 0; + auio.uio_resid = n_read; + error = fo_write(ofp, &auio, ofp->f_cred, flags, td); + if (error != 0) + break; + bytes_sent += n_read; + current_offset += n_read; + out_offset += n_read; + } + free(buf, M_LINUX); + + if (error == 0) { + *sbytes = bytes_sent; + if (offset != NULL) + *offset = current_offset; + else + error = fo_seek(fp, current_offset, SEEK_SET, td); + } + if (error == 0 && seekable) + error = fo_seek(ofp, out_offset, SEEK_SET, td); + +drop: + fdrop(ofp, td); + return (error); +} + +static int +sendfile_sendfile(struct thread *td, struct file *fp, l_int out, + off_t *offset, l_size_t count, off_t *sbytes) +{ + off_t current_offset; + int error; + + if (offset == NULL) { + if ((fp->f_ops->fo_flags & DFLAG_SEEKABLE) == 0) + return (ESPIPE); + if ((error = fo_seek(fp, 0, SEEK_CUR, td)) != 0) + return (error); current_offset = td->td_uretoff.tdu_off; + } else + current_offset = *offset; + error = fo_sendfile(fp, out, NULL, NULL, current_offset, count, + sbytes, 0, td); + if (error == 0) { + current_offset += *sbytes; + if (offset != NULL) + *offset = current_offset; + else + error = fo_seek(fp, current_offset, SEEK_SET, td); } + return (error); +} - bytes_read = 0; +static int +linux_sendfile_common(struct thread *td, l_int out, l_int in, + off_t *offset, l_size_t count) +{ + struct file *fp, *ofp; + off_t sbytes; + int error; /* Linux cannot have 0 count. */ - if (count <= 0 || current_offset < 0) { + if (count <= 0 || (offset != NULL && *offset < 0)) + return (EINVAL); + + AUDIT_ARG_FD(in); + error = fget_read(td, in, &cap_pread_rights, &fp); + if (error != 0) + return (error); + if ((fp->f_type != DTYPE_SHM && fp->f_type != DTYPE_VNODE) || + (fp->f_type == DTYPE_VNODE && + (fp->f_vnode == NULL || fp->f_vnode->v_type != VREG))) { error = EINVAL; goto drop; } - - error = fo_sendfile(fp, out, NULL, NULL, current_offset, count, - &bytes_read, 0, td); + error = fget_unlocked(td, out, &cap_no_rights, &ofp); if (error != 0) goto drop; - current_offset += bytes_read; - if (offset != NULL) { - *offset = current_offset; + if (is_regular_file(fp) && is_regular_file(ofp)) { + error = kern_copy_file_range(td, in, offset, out, NULL, count, + 0); } else { - error = fo_seek(fp, current_offset, SEEK_SET, td); - if (error != 0) - goto drop; + sbytes = 0; + if (is_stream_socket(ofp)) + error = sendfile_sendfile(td, fp, out, offset, count, + &sbytes); + else + error = sendfile_fallback(td, fp, out, offset, count, + &sbytes); + if (error == 0) + td->td_retval[0] = sbytes; } + fdrop(ofp, td); - td->td_retval[0] = (ssize_t)bytes_read; drop: fdrop(fp, td); - if (error == ENOTSOCK) - error = EINVAL; return (error); } @@ -2434,10 +2579,10 @@ linux_sendfile(struct thread *td, struct linux_sendfile_args *arg) /* * Differences between FreeBSD and Linux sendfile: * - Linux doesn't send anything when count is 0 (FreeBSD uses 0 to - * mean send the whole file.) In linux_sendfile given fds are still - * checked for validity when the count is 0. + * mean send the whole file). * - Linux can send to any fd whereas FreeBSD only supports sockets. - * The same restriction follows for linux_sendfile. + * We therefore use FreeBSD sendfile where possible for performance, + * but fall back on a manual copy (sendfile_fallback). * - Linux doesn't have an equivalent for FreeBSD's flags and sf_hdtr. * - Linux takes an offset pointer and updates it to the read location. * FreeBSD takes in an offset and a 'bytes read' parameter which is