From nobody Fri Feb 03 18:16:22 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P7kPg3jLpz3kbWY for ; Fri, 3 Feb 2023 18:16:39 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4P7kPf4FlSz4FZ3 for ; Fri, 3 Feb 2023 18:16:38 +0000 (UTC) (envelope-from sobomax@sippysoft.com) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of sobomax@sippysoft.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=sobomax@sippysoft.com; dmarc=none Received: by mail-ej1-f43.google.com with SMTP id ml19so17817690ejb.0 for ; Fri, 03 Feb 2023 10:16:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EIcH4o4HIL4LEuaFA7y1SAft7Wvg9xXR3VOs0fOjvgA=; b=Piut1p3MylmDIzK1NVvMBHUwEHaJ8bpfj8sNCdmUt3dZ+iJfxsIQmr9UZCeVRzgxyJ CPZqLHB0OnCT1IzhSa1j4ed5BB05DUptliM7coDiCxb/7VpwgAHbRjKyF8OEJ/3yng0L nD80ix7BzkoHomyujna4TOljBM6F1e2B0MJHvtfaT7FSAjxyj3a3L0j8T1KlrVLYqklG syO8vAPnwV7y2awp2+ME9EUsW1Mk9Vqh/1VBW+o3NTwi4K7xb/rFvo1WbGaaGg09Gr/x SILZP9B8x37J6BP/4wl6wwJ2Z+KvW7qC8N8kfHpOWz0CCBVXj1PeeOyjpBc0oupVaMWq HbLg== X-Gm-Message-State: AO0yUKUlJVc54rEQZICXL8MsE6Wt7d5UwJhtnxV4IJp9jVY5Dx4KALzF HmLBqADPQnSIw8ZEpW30KFvTKysOnkfjMKd8BIKL3Q== X-Google-Smtp-Source: AK7set/9cNwHrR6jUqQQ8M+PZ2d3d2qhQ0aY+H6dany8ZnHrfFyvlxfmaoHB6X/wJTfkrYUL72JVWyYEpfxLNP4ETuE= X-Received: by 2002:a17:906:191:b0:781:f54c:1947 with SMTP id 17-20020a170906019100b00781f54c1947mr3295082ejb.69.1675448196203; Fri, 03 Feb 2023 10:16:36 -0800 (PST) List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 References: <202302021720.312HKDQG099212@gitrepo.freebsd.org> In-Reply-To: <202302021720.312HKDQG099212@gitrepo.freebsd.org> From: Maxim Sobolev Date: Fri, 3 Feb 2023 19:16:22 +0100 Message-ID: Subject: Re: git: 69d94f4c7608 - main - Add tarfs, a filesystem backed by tarballs. To: =?UTF-8?Q?Dag=2DErling_Sm=C3=B8rgrav?= Cc: src-committers , dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: multipart/alternative; boundary="000000000000931c1e05f3cfae44" X-Spamd-Result: default: False [-3.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FORGED_SENDER(0.30)[sobomax@freebsd.org,sobomax@sippysoft.com]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; MIME_GOOD(-0.10)[multipart/alternative,text/plain]; RWL_MAILSPIKE_POSSIBLE(0.00)[209.85.218.43:from]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; MIME_TRACE(0.00)[0:+,1:+,2:~]; MLMMJ_DEST(0.00)[dev-commits-src-main@freebsd.org]; RCVD_IN_DNSWL_NONE(0.00)[209.85.218.43:from]; R_DKIM_NA(0.00)[]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DMARC_NA(0.00)[freebsd.org]; FREEFALL_USER(0.00)[sobomax]; ARC_NA(0.00)[]; FROM_NEQ_ENVFROM(0.00)[sobomax@freebsd.org,sobomax@sippysoft.com]; RCPT_COUNT_THREE(0.00)[4]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[dev-commits-src-main@freebsd.org]; RCVD_COUNT_TWO(0.00)[2] X-Rspamd-Queue-Id: 4P7kPf4FlSz4FZ3 X-Spamd-Bar: -- X-ThisMailContainsUnwantedMimeParts: N --000000000000931c1e05f3cfae44 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Wow, cool, thank you so much! It feels like Christmas again. >:-) I see this being immediately useful for anyone building a custom system to replace uzip+ufs combo, or other similar methods for creating read-only compressed storage containers! Just curious has you done some performance testing? Something like "worldstone" but with /usr/src mounted off tar archive vs. "normal" UFS would be interesting to see. Also, has any, even cursory, security audit been done on tar processing routines? Of course with functionality being opt-in the onus is on the user to make sure only tars obtained from trusted sources are used and in a way that protects tar file content from modification by unprivileged users. However, it won't protect us from FreeBSD looking bad in public eyes, if some high-profile institutional user of FreeBSD is breached by exploiting some of the vulnerability in this code few years down the line when it hits RELENG branch. At the very least, some big, fat warning can be added into the man page to notify an user about the code being somewhat fresh and not on par quality-wise with something like UFS or ZFS. Plus providing some tips on best practices on how to reduce exposure when tarfs is used (nosuid mount, proper tar file permissions, trusted sources etc). This is of course all hypothetical, but given the history of buffer/integer overflows etc in handling user-supplied data in simple syscalls operating on structures of 1-2 orders of magnitude smaller size and lower complexity, I find it unlikely that fresh-off-the-mill tar code won't have any. Perhaps, some automated fuzzing approach can be employed to see if it can crash kernel by giving it a slightly corrupted but otherwise valid tar file? If Juniper sponsored the development of this feature I suspect they may not be the ones least interested to make sure using it won't compromise security of their products. Pure speculation of course on my par, but pretty reasonable at that. Anyhow, just my few Canadian cents on the topic, while it fresh. Thanks again for anyone involved to make this available. I look forward to get my hands on it as soon as soon as I get back from FOSDEM, if not sooner. -Max On Thu, Feb 2, 2023, 6:20 PM Dag-Erling Sm=C3=B8rgrav wro= te: > The branch main has been updated by des: > > URL: > https://cgit.FreeBSD.org/src/commit/?id=3D69d94f4c7608e415059965593674507= 06e91fbb8 > > commit 69d94f4c7608e41505996559367450706e91fbb8 > Author: Dag-Erling Sm=C3=B8rgrav > AuthorDate: 2023-02-02 17:18:41 +0000 > Commit: Dag-Erling Sm=C3=B8rgrav > CommitDate: 2023-02-02 17:19:29 +0000 > > Add tarfs, a filesystem backed by tarballs. > > Sponsored by: Juniper Networks, Inc. > Sponsored by: Klara, Inc. > Reviewed by: pauamma, imp > Differential Revision: https://reviews.freebsd.org/D37753 > --- > etc/mtree/BSD.tests.dist | 2 + > share/man/man5/Makefile | 1 + > share/man/man5/tarfs.5 | 103 ++++ > sys/conf/files | 4 + > sys/conf/options | 4 + > sys/fs/tarfs/tarfs.h | 254 +++++++++ > sys/fs/tarfs/tarfs_dbg.h | 65 +++ > sys/fs/tarfs/tarfs_io.c | 727 +++++++++++++++++++++++ > sys/fs/tarfs/tarfs_subr.c | 603 ++++++++++++++++++++ > sys/fs/tarfs/tarfs_vfsops.c | 1173 > ++++++++++++++++++++++++++++++++++++++ > sys/fs/tarfs/tarfs_vnops.c | 642 +++++++++++++++++++++ > sys/kern/subr_witness.c | 6 + > sys/modules/Makefile | 1 + > sys/modules/tarfs/Makefile | 23 + > tests/sys/fs/Makefile | 1 + > tests/sys/fs/tarfs/Makefile | 10 + > tests/sys/fs/tarfs/mktar.c | 238 ++++++++ > tests/sys/fs/tarfs/tarfs_test.sh | 54 ++ > 18 files changed, 3911 insertions(+) > > diff --git a/etc/mtree/BSD.tests.dist b/etc/mtree/BSD.tests.dist > index 0d05ecaf06fc..b4b18997b7f9 100644 > --- a/etc/mtree/BSD.tests.dist > +++ b/etc/mtree/BSD.tests.dist > @@ -757,6 +757,8 @@ > fs > fusefs > .. > + tarfs > + .. > tmpfs > .. > .. > diff --git a/share/man/man5/Makefile b/share/man/man5/Makefile > index 2d49d981c2f9..f6e91e4ed00b 100644 > --- a/share/man/man5/Makefile > +++ b/share/man/man5/Makefile > @@ -70,6 +70,7 @@ MAN=3D acct.5 \ > style.Makefile.5 \ > style.mdoc.5 \ > sysctl.conf.5 \ > + tarfs.5 \ > tmpfs.5 \ > unionfs.5 > > diff --git a/share/man/man5/tarfs.5 b/share/man/man5/tarfs.5 > new file mode 100644 > index 000000000000..b25131c323c1 > --- /dev/null > +++ b/share/man/man5/tarfs.5 > @@ -0,0 +1,103 @@ > +.\"- > +.\" SPDX-License-Identifier: BSD-2-Clause > +.\" > +.\" Copyright (c) 2022 Klara, Inc. > +.\" > +.\" Redistribution and use in source and binary forms, with or without > +.\" modification, are permitted provided that the following conditions > +.\" are met: > +.\" 1. Redistributions of source code must retain the above copyright > +.\" notice, this list of conditions and the following disclaimer. > +.\" 2. Redistributions in binary form must reproduce the above copyright > +.\" notice, this list of conditions and the following disclaimer in t= he > +.\" documentation and/or other materials provided with the > distribution. > +.\" > +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' A= ND > +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, TH= E > +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE > LIABLE > +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE > GOODS > +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION= ) > +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > STRICT > +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN AN= Y > WAY > +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY = OF > +.\" SUCH DAMAGE. > +.\" > +.Dd February 2, 2023 > +.Dt TARFS 5 > +.Os > +.Sh NAME > +.Nm tarfs > +.Nd tarball filesystem > +.Sh SYNOPSIS > +To compile this driver into the kernel, place the following line in > +your kernel configuration file: > +.Bd -ragged -offset indent > +.Cd "options TARFS" > +.Ed > +.Pp > +Alternatively, to load the driver as a module at boot time, place the > +following line in > +.Xr loader.conf 5 : > +.Bd -literal -offset indent > +tarfs_load=3D"YES" > +.Ed > +.Sh DESCRIPTION > +The > +.Nm > +driver implementes a read-only filesystem backed by a > +.Xr tar 5 > +file. > +Currently, only POSIX archives, optionally compressed with > +.Xr zstd 1 , > +are supported. > +.Pp > +The preferred I/O size for > +.Nm > +filesystems can be adjusted using the > +.Va vfs.tarfs.ioshift > +sysctl setting and tunable. > +Setting it to 0 will reset it to its default value. > +Note that changes to this setting only apply to filesystems mounted > +after the change. > +.Sh DIAGNOSTICS > +If enabled by the > +.Dv TARFS_DEBUG > +kernel option, the > +.Va vfs.tarfs.debug > +sysctl setting can be used to control debugging output from the > +.Nm > +driver. > +Debugging output for individual sections of the driver can be enabled > +by adding together the relevant values from the table below. > +.Bl -column Value Description > +.It 0x01 Ta Memory allocations > +.It 0x02 Ta Checksum calculations > +.It 0x04 Ta Filesystem operations (vfsops) > +.It 0x08 Ta Path lookups > +.It 0x10 Ta File operations (vnops) > +.It 0x20 Ta General I/O > +.It 0x40 Ta Decompression > +.It 0x80 Ta Decompression index > +.It 0x100 Ta Sparse file mapping > +.El > +.Sh SEE ALSO > +.Xr tar 1 , > +.Xr zstd 1 , > +.Xr fstab 5 , > +.Xr tar 5 , > +.Xr mount 8 , > +.Xr sysctl 8 > +.Sh HISTORY > +.An -nosplit > +The > +.Nm > +driver was developed by > +.An Stephen J. Kiernan Aq Mt stevek@FreeBSD.org > +and > +.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org > +for Juniper Networks and Klara Systems. > +This manual page was written by > +.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org > +for Juniper Networks and Klara Systems. > diff --git a/sys/conf/files b/sys/conf/files > index 6cb4abcd9223..08966a9b46e4 100644 > --- a/sys/conf/files > +++ b/sys/conf/files > @@ -3615,6 +3615,10 @@ fs/smbfs/smbfs_smb.c optional smbfs > fs/smbfs/smbfs_subr.c optional smbfs > fs/smbfs/smbfs_vfsops.c optional smbfs > fs/smbfs/smbfs_vnops.c optional smbfs > +fs/tarfs/tarfs_io.c optional tarfs compile-with "${NORMAL_C} > -I$S/contrib/zstd/lib/freebsd" > +fs/tarfs/tarfs_subr.c optional tarfs > +fs/tarfs/tarfs_vfsops.c optional tarfs > +fs/tarfs/tarfs_vnops.c optional tarfs > fs/udf/osta.c optional udf > fs/udf/udf_iconv.c optional udf_iconv > fs/udf/udf_vfsops.c optional udf > diff --git a/sys/conf/options b/sys/conf/options > index 1f5003507539..3b2be66ba602 100644 > --- a/sys/conf/options > +++ b/sys/conf/options > @@ -265,6 +265,7 @@ NULLFS opt_dontuse.h > PROCFS opt_dontuse.h > PSEUDOFS opt_dontuse.h > SMBFS opt_dontuse.h > +TARFS opt_dontuse.h > TMPFS opt_dontuse.h > UDF opt_dontuse.h > UNIONFS opt_dontuse.h > @@ -273,6 +274,9 @@ ZFS opt_dontuse.h > # Pseudofs debugging > PSEUDOFS_TRACE opt_pseudofs.h > > +# Tarfs debugging > +TARFS_DEBUG opt_tarfs.h > + > # In-kernel GSS-API > KGSSAPI opt_kgssapi.h > KGSSAPI_DEBUG opt_kgssapi.h > diff --git a/sys/fs/tarfs/tarfs.h b/sys/fs/tarfs/tarfs.h > new file mode 100644 > index 000000000000..dffd60ee6d8a > --- /dev/null > +++ b/sys/fs/tarfs/tarfs.h > @@ -0,0 +1,254 @@ > +/*- > + * SPDX-License-Identifier: BSD-2-Clause > + * > + * Copyright (c) 2013 Juniper Networks, Inc. > + * Copyright (c) 2022-2023 Klara, Inc. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in th= e > + * documentation and/or other materials provided with the distributio= n. > + * > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AN= D > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIAB= LE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOO= DS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY > WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY O= F > + * SUCH DAMAGE. > + */ > + > +#ifndef _FS_TARFS_TARFS_H_ > +#define _FS_TARFS_TARFS_H_ > + > +#ifndef _KERNEL > +#error Should only be included by kernel > +#endif > + > +MALLOC_DECLARE(M_TARFSMNT); > +MALLOC_DECLARE(M_TARFSNODE); > +MALLOC_DECLARE(M_TARFSNAME); > + > +#ifdef SYSCTL_DECL > +SYSCTL_DECL(_vfs_tarfs); > +#endif > + > +struct componentname; > +struct mount; > +struct vnode; > + > +/* > + * Internal representation of a tarfs file system node. > + */ > +struct tarfs_node { > + TAILQ_ENTRY(tarfs_node) entries; > + TAILQ_ENTRY(tarfs_node) dirents; > + > + struct mtx lock; > + > + struct vnode *vnode; > + struct tarfs_mount *tmp; > + enum vtype type; > + ino_t ino; > + off_t offset; > + size_t size; > + size_t physize; > + char *name; > + size_t namelen; > + > + /* Node attributes */ > + uid_t uid; > + gid_t gid; > + mode_t mode; > + unsigned int flags; > + nlink_t nlink; > + struct timespec atime; > + struct timespec mtime; > + struct timespec ctime; > + struct timespec birthtime; > + unsigned long gen; > + > + /* Block map */ > + size_t nblk; > + struct tarfs_blk *blk; > + > + struct tarfs_node *parent; > + union { > + /* VDIR */ > + struct { > + TAILQ_HEAD(, tarfs_node) dirhead; > + off_t lastcookie; > + struct tarfs_node *lastnode; > + } dir; > + > + /* VLNK */ > + struct { > + char *name; > + size_t namelen; > + } link; > + > + /* VBLK or VCHR */ > + dev_t rdev; > + > + /* VREG */ > + struct tarfs_node *other; > + }; > +}; > + > +/* > + * Entry in sparse file block map. > + */ > +struct tarfs_blk { > + off_t i; /* input (physical) offset */ > + off_t o; /* output (logical) offset */ > + size_t l; /* length */ > +}; > + > +/* > + * Decompression buffer. > + */ > +#define TARFS_ZBUF_SIZE 1048576 > +struct tarfs_zbuf { > + u_char buf[TARFS_ZBUF_SIZE]; > + size_t off; /* offset of contents */ > + size_t len; /* length of contents */ > +}; > + > +/* > + * Internal representation of a tarfs mount point. > + */ > +struct tarfs_mount { > + TAILQ_HEAD(, tarfs_node) allnodes; > + struct mtx allnode_lock; > + > + struct tarfs_node *root; > + struct vnode *vp; > + struct mount *vfs; > + ino_t ino; > + struct unrhdr *ino_unr; > + size_t iosize; > + size_t nblocks; > + size_t nfiles; > + time_t mtime; /* default mtime for directories = */ > + > + struct tarfs_zio *zio; > + struct vnode *znode; > +}; > + > +struct tarfs_zio { > + struct tarfs_mount *tmp; > + > + /* decompression state */ > +#ifdef ZSTDIO > + struct tarfs_zstd *zstd; /* decompression state (zstd) */ > +#endif > + off_t ipos; /* current input position */ > + off_t opos; /* current output position */ > + > + /* index of compression frames */ > + unsigned int curidx; /* current index position*/ > + unsigned int nidx; /* number of index entries */ > + unsigned int szidx; /* index capacity */ > + struct tarfs_idx { off_t i, o; } *idx; > +}; > + > +struct tarfs_fid { > + u_short len; /* length of data in bytes */ > + u_short data0; /* force alignment */ > + ino_t ino; > + unsigned long gen; > +}; > + > +#define TARFS_NODE_LOCK(tnp) \ > + mtx_lock(&(tnp)->lock) > +#define TARFS_NODE_UNLOCK(tnp) \ > + mtx_unlock(&(tnp)->lock) > +#define TARFS_ALLNODES_LOCK(tnp) \ > + mtx_lock(&(tmp)->allnode_lock) > +#define TARFS_ALLNODES_UNLOCK(tnp) \ > + mtx_unlock(&(tmp)->allnode_lock) > + > +/* > + * Data and metadata within tar files are aligned on 512-byte boundaries= , > + * to match the block size of the magnetic tapes they were originally > + * intended for. > + */ > +#define TARFS_BSHIFT 9 > +#define TARFS_BLOCKSIZE (size_t)(1U << TARFS_BSHIFT) > +#define TARFS_BLKOFF(l) ((l) % TARFS_BLOCKSIZE) > +#define TARFS_BLKNUM(l) ((l) >> TARFS_BSHIFT) > +#define TARFS_SZ2BLKS(sz) (((sz) + TARFS_BLOCKSIZE - 1) / > TARFS_BLOCKSIZE) > + > +/* > + * Our preferred I/O size. > + */ > +extern unsigned int tarfs_ioshift; > +#define TARFS_IOSHIFT_MIN TARFS_BSHIFT > +#define TARFS_IOSHIFT_DEFAULT PAGE_SHIFT > +#define TARFS_IOSHIFT_MAX PAGE_SHIFT > + > +#define TARFS_ROOTINO ((ino_t)3) > +#define TARFS_ZIOINO ((ino_t)4) > +#define TARFS_MININO ((ino_t)65535) > + > +#define TARFS_COOKIE_DOT 0 > +#define TARFS_COOKIE_DOTDOT 1 > +#define TARFS_COOKIE_EOF OFF_MAX > + > +#define TARFS_ZIO_NAME ".tar" > +#define TARFS_ZIO_NAMELEN (sizeof(TARFS_ZIO_NAME) - 1) > + > +extern struct vop_vector tarfs_vnodeops; > + > +static inline > +struct tarfs_mount * > +MP_TO_TARFS_MOUNT(struct mount *mp) > +{ > + > + MPASS(mp !=3D NULL && mp->mnt_data !=3D NULL); > + return (mp->mnt_data); > +} > + > +static inline > +struct tarfs_node * > +VP_TO_TARFS_NODE(struct vnode *vp) > +{ > + > + MPASS(vp !=3D NULL && vp->v_data !=3D NULL); > + return (vp->v_data); > +} > + > +int tarfs_alloc_node(struct tarfs_mount *tmp, const char *name, > + size_t namelen, enum vtype type, off_t off, size_t sz, > + time_t mtime, uid_t uid, gid_t gid, mode_t mode, > + unsigned int flags, const char *linkname, dev_t rdev, > + struct tarfs_node *parent, struct tarfs_node **node); > +int tarfs_load_blockmap(struct tarfs_node *tnp, size_t realsize); > +void tarfs_dump_tree(struct tarfs_node *tnp); > +void tarfs_free_node(struct tarfs_node *tnp); > +struct tarfs_node * > + tarfs_lookup_dir(struct tarfs_node *tnp, off_t cookie); > +struct tarfs_node * > + tarfs_lookup_node(struct tarfs_node *tnp, struct tarfs_node *f, > + struct componentname *cnp); > +void tarfs_print_node(struct tarfs_node *tnp); > +int tarfs_read_file(struct tarfs_node *tnp, size_t len, struct uio > *uiop); > + > +int tarfs_io_init(struct tarfs_mount *tmp); > +int tarfs_io_fini(struct tarfs_mount *tmp); > +int tarfs_io_read(struct tarfs_mount *tmp, bool raw, > + struct uio *uiop); > +ssize_t tarfs_io_read_buf(struct tarfs_mount *tmp, bool raw, > + void *buf, off_t off, size_t len); > +unsigned int > + tarfs_strtofflags(const char *str, char **end); > + > +#endif /* _FS_TARFS_TARFS_H_ */ > diff --git a/sys/fs/tarfs/tarfs_dbg.h b/sys/fs/tarfs/tarfs_dbg.h > new file mode 100644 > index 000000000000..45d11d679719 > --- /dev/null > +++ b/sys/fs/tarfs/tarfs_dbg.h > @@ -0,0 +1,65 @@ > +/*- > + * SPDX-License-Identifier: BSD-2-Clause > + * > + * Copyright (c) 2013 Juniper Networks, Inc. > + * Copyright (c) 2022 Klara, Inc. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in th= e > + * documentation and/or other materials provided with the distributio= n. > + * > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AN= D > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIAB= LE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOO= DS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY > WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY O= F > + * SUCH DAMAGE. > + */ > + > +#ifndef _FS_TARFS_TARFS_DBG_H_ > +#define _FS_TARFS_TARFS_DBG_H_ > + > +#ifndef _KERNEL > +#error Should only be included by kernel > +#endif > + > +#ifdef TARFS_DEBUG > +extern int tarfs_debug; > + > +#define TARFS_DEBUG_ALLOC 0x01 > +#define TARFS_DEBUG_CHECKSUM 0x02 > +#define TARFS_DEBUG_FS 0x04 > +#define TARFS_DEBUG_LOOKUP 0x08 > +#define TARFS_DEBUG_VNODE 0x10 > +#define TARFS_DEBUG_IO 0x20 > +#define TARFS_DEBUG_ZIO 0x40 > +#define TARFS_DEBUG_ZIDX 0x80 > +#define TARFS_DEBUG_MAP 0x100 > + > +#define TARFS_DPF(category, fmt, ...) > \ > + do { \ > + if ((tarfs_debug & TARFS_DEBUG_##category) !=3D 0) = \ > + printf(fmt, ## __VA_ARGS__); \ > + } while (0) > +#define TARFS_DPF_IFF(category, cond, fmt, ...) > \ > + do { \ > + if ((cond) \ > + && (tarfs_debug & TARFS_DEBUG_##category) !=3D 0) = \ > + printf(fmt, ## __VA_ARGS__); \ > + } while (0) > +#else > +#define TARFS_DPF(category, fmt, ...) > +#define TARFS_DPF_IFF(category, cond, fmt, ...) > +#endif > + > +#endif /* _FS_TARFS_TARFS_DBG_H_ */ > diff --git a/sys/fs/tarfs/tarfs_io.c b/sys/fs/tarfs/tarfs_io.c > new file mode 100644 > index 000000000000..b957ac11ff51 > --- /dev/null > +++ b/sys/fs/tarfs/tarfs_io.c > @@ -0,0 +1,727 @@ > +/*- > + * SPDX-License-Identifier: BSD-2-Clause > + * > + * Copyright (c) 2013 Juniper Networks, Inc. > + * Copyright (c) 2022-2023 Klara, Inc. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * 1. Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * 2. Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in th= e > + * documentation and/or other materials provided with the distributio= n. > + * > + * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AN= D > + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR > PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIAB= LE > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > CONSEQUENTIAL > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOO= DS > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, > STRICT > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY > WAY > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY O= F > + * SUCH DAMAGE. > + */ > + > +#include "opt_tarfs.h" > +#include "opt_zstdio.h" > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#ifdef ZSTDIO > +#define ZSTD_STATIC_LINKING_ONLY > +#include > +#endif > + > +#include > +#include > + > +#ifdef TARFS_DEBUG > +SYSCTL_NODE(_vfs_tarfs, OID_AUTO, zio, CTLFLAG_RD, 0, > + "Tar filesystem decompression layer"); > +COUNTER_U64_DEFINE_EARLY(tarfs_zio_inflated); > +SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, inflated, CTLFLAG_RD, > + &tarfs_zio_inflated, "Amount of compressed data inflated."); > +COUNTER_U64_DEFINE_EARLY(tarfs_zio_consumed); > +SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, consumed, CTLFLAG_RD, > + &tarfs_zio_consumed, "Amount of compressed data consumed."); > +COUNTER_U64_DEFINE_EARLY(tarfs_zio_bounced); > +SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, bounced, CTLFLAG_RD, > + &tarfs_zio_bounced, "Amount of decompressed data bounced."); > + > +static int > +tarfs_sysctl_handle_zio_reset(SYSCTL_HANDLER_ARGS) > +{ > + unsigned int tmp; > + int error; > + > + tmp =3D 0; > + if ((error =3D SYSCTL_OUT(req, &tmp, sizeof(tmp))) !=3D 0) > + return (error); > + if (req->newptr !=3D NULL) { > + if ((error =3D SYSCTL_IN(req, &tmp, sizeof(tmp))) !=3D 0) > + return (error); > + counter_u64_zero(tarfs_zio_inflated); > + counter_u64_zero(tarfs_zio_consumed); > + counter_u64_zero(tarfs_zio_bounced); > + } > + return (0); > +} > + > +SYSCTL_PROC(_vfs_tarfs_zio, OID_AUTO, reset, > + CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW, > + NULL, 0, tarfs_sysctl_handle_zio_reset, "IU", > + "Reset compression counters."); > +#endif > + > +MALLOC_DEFINE(M_TARFSZSTATE, "tarfs zstate", "tarfs decompression state"= ); > +MALLOC_DEFINE(M_TARFSZBUF, "tarfs zbuf", "tarfs decompression buffers"); > + > +#define XZ_MAGIC (uint8_t[]){ 0xfd, 0x37, 0x7a, 0x58, 0x5a= } > +#define ZLIB_MAGIC (uint8_t[]){ 0x1f, 0x8b, 0x08 } > +#define ZSTD_MAGIC (uint8_t[]){ 0x28, 0xb5, 0x2f, 0xfd } > + > +#ifdef ZSTDIO > +struct tarfs_zstd { > + ZSTD_DStream *zds; > +}; > +#endif > + > +/* XXX review use of curthread / uio_td / td_cred */ > + > +/* > + * Reads from the tar file according to the provided uio. If the archiv= e > + * is compressed and raw is false, reads the decompressed stream; > + * otherwise, reads directly from the original file. Returns 0 on succe= ss > + * and a positive errno value on failure. > + */ > +int > +tarfs_io_read(struct tarfs_mount *tmp, bool raw, struct uio *uiop) > +{ > + void *rl =3D NULL; > + off_t off =3D uiop->uio_offset; > + size_t len =3D uiop->uio_resid; > + int error; > + > + if (raw || tmp->znode =3D=3D NULL) { > + rl =3D vn_rangelock_rlock(tmp->vp, off, off + len); > + error =3D vn_lock(tmp->vp, LK_SHARED); > + if (error =3D=3D 0) { > + error =3D VOP_READ(tmp->vp, uiop, > + IO_DIRECT|IO_NODELOCKED, > + uiop->uio_td->td_ucred); > + VOP_UNLOCK(tmp->vp); > + } > + vn_rangelock_unlock(tmp->vp, rl); > + } else { > + error =3D vn_lock(tmp->znode, LK_EXCLUSIVE); > + if (error =3D=3D 0) { > + error =3D VOP_READ(tmp->znode, uiop, > + IO_DIRECT | IO_NODELOCKED, > + uiop->uio_td->td_ucred); > + VOP_UNLOCK(tmp->znode); > + } > + } > + TARFS_DPF(IO, "%s(%zu, %zu) =3D %d (resid %zd)\n", __func__, > + (size_t)off, len, error, uiop->uio_resid); > + return (error); > +} > + > +/* > + * Reads from the tar file into the provided buffer. If the archive is > + * compressed and raw is false, reads the decompressed stream; otherwise= , > + * reads directly from the original file. Returns the number of bytes > + * read on success, 0 on EOF, and a negative errno value on failure. > + */ > +ssize_t > +tarfs_io_read_buf(struct tarfs_mount *tmp, bool raw, > + void *buf, off_t off, size_t len) > +{ > + struct uio auio; > + struct iovec aiov; > + ssize_t res; > + int error; > + > + if (len =3D=3D 0) { > + TARFS_DPF(IO, "%s(%zu, %zu) null\n", __func__, > + (size_t)off, len); > + return (0); > + } > + aiov.iov_base =3D buf; > + aiov.iov_len =3D len; > + auio.uio_iov =3D &aiov; > + auio.uio_iovcnt =3D 1; > + auio.uio_offset =3D off; > + auio.uio_segflg =3D UIO_SYSSPACE; > + auio.uio_rw =3D UIO_READ; > + auio.uio_resid =3D len; > + auio.uio_td =3D curthread; > + error =3D tarfs_io_read(tmp, raw, &auio); > + if (error !=3D 0) { > + TARFS_DPF(IO, "%s(%zu, %zu) error %d\n", __func__, > + (size_t)off, len, error); > + return (-error); > + } > + res =3D len - auio.uio_resid; > + if (res =3D=3D 0 && len !=3D 0) { > + TARFS_DPF(IO, "%s(%zu, %zu) eof\n", __func__, > + (size_t)off, len); > + } else { > + TARFS_DPF(IO, "%s(%zu, %zu) read %zd | %*D\n", __func__, > + (size_t)off, len, res, > + (int)(res > 8 ? 8 : res), (uint8_t *)buf, " "); > + } > + return (res); > +} > + > +#ifdef ZSTDIO > +static void * > +tarfs_zstate_alloc(void *opaque, size_t size) > +{ > + > + (void)opaque; > + return (malloc(size, M_TARFSZSTATE, M_WAITOK)); > +} > +#endif > + > +#ifdef ZSTDIO > +static void > +tarfs_zstate_free(void *opaque, void *address) > +{ > + > + (void)opaque; > + free(address, M_TARFSZSTATE); > +} > +#endif > + > +#ifdef ZSTDIO > +static ZSTD_customMem tarfs_zstd_mem =3D { > + tarfs_zstate_alloc, > + tarfs_zstate_free, > + NULL, > +}; > +#endif > + > +/* > + * Updates the decompression frame index, recording the current input an= d > + * output offsets in a new index entry, and growing the index if > + * necessary. > + */ > +static void > +tarfs_zio_update_index(struct tarfs_zio *zio, off_t i, off_t o) > +{ > + > + if (++zio->curidx >=3D zio->nidx) { > + if (++zio->nidx > zio->szidx) { > + zio->szidx *=3D 2; > + zio->idx =3D realloc(zio->idx, > + zio->szidx * sizeof(*zio->idx), > + M_TARFSZSTATE, M_ZERO | M_WAITOK); > + TARFS_DPF(ALLOC, "%s: resized zio index\n", > __func__); > + } > + zio->idx[zio->curidx].i =3D i; > + zio->idx[zio->curidx].o =3D o; > + TARFS_DPF(ZIDX, "%s: index %u =3D i %zu o %zu\n", __func_= _, > + zio->curidx, (size_t)zio->idx[zio->curidx].i, > + (size_t)zio->idx[zio->curidx].o); > + } > + MPASS(zio->idx[zio->curidx].i =3D=3D i); > + MPASS(zio->idx[zio->curidx].o =3D=3D o); > +} > + > +/* > + * VOP_ACCESS for zio node. > + */ > +static int > +tarfs_zaccess(struct vop_access_args *ap) > +{ > + struct vnode *vp =3D ap->a_vp; > + struct tarfs_zio *zio =3D vp->v_data; > + struct tarfs_mount *tmp =3D zio->tmp; > + accmode_t accmode =3D ap->a_accmode; > + int error =3D EPERM; > + > + if (accmode =3D=3D VREAD) { > + error =3D vn_lock(tmp->vp, LK_SHARED); > + if (error =3D=3D 0) { > + error =3D VOP_ACCESS(tmp->vp, accmode, ap->a_cred= , > ap->a_td); > + VOP_UNLOCK(tmp->vp); > + } > + } > + TARFS_DPF(ZIO, "%s(%d) =3D %d\n", __func__, accmode, error); > + return (error); > +} > + > +/* > + * VOP_GETATTR for zio node. > + */ > +static int > +tarfs_zgetattr(struct vop_getattr_args *ap) > +{ > + struct vattr va; > + struct vnode *vp =3D ap->a_vp; > + struct tarfs_zio *zio =3D vp->v_data; > + struct tarfs_mount *tmp =3D zio->tmp; > + struct vattr *vap =3D ap->a_vap; > + int error =3D 0; > + > + VATTR_NULL(vap); > + error =3D vn_lock(tmp->vp, LK_SHARED); > + if (error =3D=3D 0) { > + error =3D VOP_GETATTR(tmp->vp, &va, ap->a_cred); > + VOP_UNLOCK(tmp->vp); > + if (error =3D=3D 0) { > + vap->va_type =3D VREG; > + vap->va_mode =3D va.va_mode; > + vap->va_nlink =3D 1; > + vap->va_gid =3D va.va_gid; > + vap->va_uid =3D va.va_uid; > + vap->va_fsid =3D vp->v_mount->mnt_stat.f_fsid.val= [0]; > + vap->va_fileid =3D TARFS_ZIOINO; > + vap->va_size =3D zio->idx[zio->nidx - 1].o; > + vap->va_blocksize =3D vp->v_mount->mnt_stat.f_ios= ize; > + vap->va_atime =3D va.va_atime; > + vap->va_ctime =3D va.va_ctime; > + vap->va_mtime =3D va.va_mtime; > + vap->va_birthtime =3D tmp->root->birthtime; > + vap->va_bytes =3D va.va_bytes; > + } > + } > + TARFS_DPF(ZIO, "%s() =3D %d\n", __func__, error); > + return (error); > +} > + > +#ifdef ZSTDIO > +/* > + * VOP_READ for zio node, zstd edition. > + */ > +static int > +tarfs_zread_zstd(struct tarfs_zio *zio, struct uio *uiop) > +{ > + void *ibuf =3D NULL, *obuf =3D NULL, *rl =3D NULL; > + struct uio auio; > + struct iovec aiov; > + struct tarfs_mount *tmp =3D zio->tmp; > + struct tarfs_zstd *zstd =3D zio->zstd; > + struct thread *td =3D curthread; > + ZSTD_inBuffer zib; > + ZSTD_outBuffer zob; > + off_t zsize; > + off_t ipos, opos; > + size_t ilen, olen; > + size_t zerror; > + off_t off =3D uiop->uio_offset; > + size_t len =3D uiop->uio_resid; > + size_t resid =3D uiop->uio_resid; > + size_t bsize; > + int error; > + bool reset =3D false; > + > + /* do we have to rewind? */ > + if (off < zio->opos) { > + while (zio->curidx > 0 && off < zio->idx[zio->curidx].o) > + zio->curidx--; > + reset =3D true; > + } > + /* advance to the nearest index entry */ > + if (off > zio->opos) { > + // XXX maybe do a binary search instead > + while (zio->curidx < zio->nidx - 1 && > + off >=3D zio->idx[zio->curidx + 1].o) { > + zio->curidx++; > + reset =3D true; > + } > + } > + /* reset the decompression stream if needed */ > + if (reset) { > + zio->ipos =3D zio->idx[zio->curidx].i; > + zio->opos =3D zio->idx[zio->curidx].o; > + ZSTD_resetDStream(zstd->zds); > + TARFS_DPF(ZIDX, "%s: skipping to index %u =3D i %zu o > %zu\n", __func__, > + zio->curidx, (size_t)zio->ipos, (size_t)zio->opos); > + } else { > + TARFS_DPF(ZIDX, "%s: continuing at i %zu o %zu\n", > __func__, > + (size_t)zio->ipos, (size_t)zio->opos); > + } > + > + /* > + * Set up a temporary buffer for compressed data. Use the size > + * recommended by the zstd library; this is usually 128 kB, but > + * just in case, make sure it's a multiple of the page size and n= o > + * larger than MAXBSIZE. > + */ > + bsize =3D roundup(ZSTD_CStreamOutSize(), PAGE_SIZE); > + if (bsize > MAXBSIZE) > + bsize =3D MAXBSIZE; > + ibuf =3D malloc(bsize, M_TEMP, M_WAITOK); > + zib.src =3D NULL; > + zib.size =3D 0; > + zib.pos =3D 0; > + > + /* > + * Set up the decompression buffer. If the target is not in > + * kernel space, we will have to set up a bounce buffer. > + * > + * TODO: to avoid using a bounce buffer, map destination pages > + * using vm_fault_quick_hold_pages(). > + */ > + MPASS(zio->opos <=3D off); > + MPASS(uiop->uio_iovcnt =3D=3D 1); > + MPASS(uiop->uio_iov->iov_len >=3D len); > + if (uiop->uio_segflg =3D=3D UIO_SYSSPACE) { > + zob.dst =3D uiop->uio_iov->iov_base; > + } else { > + TARFS_DPF(ALLOC, "%s: allocating %zu-byte bounce buffer\n= ", > + __func__, len); > + zob.dst =3D obuf =3D malloc(len, M_TEMP, M_WAITOK); > + } > + zob.size =3D len; > + zob.pos =3D 0; > + > + /* lock tarball */ > + rl =3D vn_rangelock_rlock(tmp->vp, zio->ipos, OFF_MAX); > + error =3D vn_lock(tmp->vp, LK_SHARED); > + if (error !=3D 0) { > + goto fail_unlocked; > + } > + /* check size */ > + error =3D vn_getsize_locked(tmp->vp, &zsize, td->td_ucred); > + if (error !=3D 0) { > + goto fail; > + } > + if (zio->ipos >=3D zsize) { > + /* beyond EOF */ > + goto fail; > + } > + > + while (resid > 0) { > + if (zib.pos =3D=3D zib.size) { > + /* request data from the underlying file */ > + aiov.iov_base =3D ibuf; > + aiov.iov_len =3D bsize; > + auio.uio_iov =3D &aiov; > + auio.uio_iovcnt =3D 1; > + auio.uio_offset =3D zio->ipos; > + auio.uio_segflg =3D UIO_SYSSPACE; > + auio.uio_rw =3D UIO_READ; > + auio.uio_resid =3D aiov.iov_len; > + auio.uio_td =3D td; > + error =3D VOP_READ(tmp->vp, &auio, > + IO_DIRECT | IO_NODELOCKED, > + td->td_ucred); > + if (error !=3D 0) > + goto fail; > + TARFS_DPF(ZIO, "%s: req %zu+%zu got %zu+%zu\n", > __func__, > + (size_t)zio->ipos, bsize, > + (size_t)zio->ipos, bsize - auio.uio_resid); > + zib.src =3D ibuf; > + zib.size =3D bsize - auio.uio_resid; > + zib.pos =3D 0; > + } > + MPASS(zib.pos <=3D zib.size); > + if (zib.pos =3D=3D zib.size) { > + TARFS_DPF(ZIO, "%s: end of file after i %zu o > %zu\n", __func__, > + (size_t)zio->ipos, (size_t)zio->opos); > + goto fail; > + } > + if (zio->opos < off) { > + /* to be discarded */ > + zob.size =3D min(off - zio->opos, len); > + zob.pos =3D 0; > *** 3111 LINES SKIPPED *** > > --000000000000931c1e05f3cfae44 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Wow, cool, thank you so much! It feels like Christma= s again. >:-) I see this being immediately useful for anyone building a = custom system to replace uzip+ufs combo, or other similar methods for creat= ing read-only compressed storage containers!

Just curious has you done some performance testing? So= mething like "worldstone" but with /usr/src mounted off tar archi= ve vs. "normal" UFS would be interesting to see.

Also, has any, even cursory, security au= dit been done on tar processing routines? Of course with functionality bein= g opt-in the onus is on the user to make sure only tars obtained from trust= ed sources are used and in a way that protects tar file content from modifi= cation by unprivileged users. However, it won't protect us from FreeBSD= looking bad in public eyes, if some high-profile institutional user of Fre= eBSD is breached by exploiting some of the vulnerability in this code few y= ears down the line when it hits RELENG branch.

<= /div>
At the very least, some big, fat warning can be adde= d into the man page to notify an user about the code being somewhat fresh a= nd not on par quality-wise with something like UFS or ZFS. Plus providing s= ome tips on best practices on how to reduce exposure when tarfs is used (no= suid mount, proper tar file permissions, trusted sources etc).

This is of course all hypothetical, = but given the history of buffer/integer overflows etc in handling user-supp= lied data in simple syscalls operating on structures of 1-2 orders of magni= tude smaller size and lower complexity, I find it unlikely that fresh-off-t= he-mill tar code won't have any. Perhaps, some automated fuzzing approa= ch can be employed to see if it can crash kernel by giving it a slightly co= rrupted but otherwise valid tar file? If Juniper sponsored the development = of this feature I suspect they may not be the ones least interested to make= sure using it won't compromise security of their products. Pure specul= ation of course on my par, but pretty reasonable at that.

Anyhow, just my few Canadian cents on the= topic, while it fresh. Thanks again for anyone involved to make this avail= able. I look forward to get my hands on it as soon as soon as I get back fr= om FOSDEM, if not sooner.

-Max

<= div dir=3D"ltr" class=3D"gmail_attr">On Thu, Feb 2, 2023, 6:20 PM Dag-Erlin= g Sm=C3=B8rgrav <des@freebsd.org> wrote:
The branch main has been updated by des:=

URL: https://cgit.FreeBSD.org/src/commit/?id=3D69= d94f4c7608e41505996559367450706e91fbb8

commit 69d94f4c7608e41505996559367450706e91fbb8
Author:=C2=A0 =C2=A0 =C2=A0Dag-Erling Sm=C3=B8rgrav <des@FreeBSD.org>=
AuthorDate: 2023-02-02 17:18:41 +0000
Commit:=C2=A0 =C2=A0 =C2=A0Dag-Erling Sm=C3=B8rgrav <des@FreeBSD.org>=
CommitDate: 2023-02-02 17:19:29 +0000

=C2=A0 =C2=A0 Add tarfs, a filesystem backed by tarballs.

=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0Juniper Networks, Inc.
=C2=A0 =C2=A0 Sponsored by:=C2=A0 =C2=A0Klara, Inc.
=C2=A0 =C2=A0 Reviewed by:=C2=A0 =C2=A0 pauamma, imp
=C2=A0 =C2=A0 Differential Revision:=C2=A0 https://reviews.freebsd.org/D37753
---
=C2=A0etc/mtree/BSD.tests.dist=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2= =A0 2 +
=C2=A0share/man/man5/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2= =A0 1 +
=C2=A0share/man/man5/tarfs.5=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2= =A0 103 ++++
=C2=A0sys/conf/files=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0|=C2=A0 =C2=A0 4 +
=C2=A0sys/conf/options=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0|=C2=A0 =C2=A0 4 +
=C2=A0sys/fs/tarfs/tarfs.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|= =C2=A0 254 +++++++++
=C2=A0sys/fs/tarfs/tarfs_dbg.h=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2= =A065 +++
=C2=A0sys/fs/tarfs/tarfs_io.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 727= +++++++++++++++++++++++
=C2=A0sys/fs/tarfs/tarfs_subr.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 603 ++++= ++++++++++++++++
=C2=A0sys/fs/tarfs/tarfs_vfsops.c=C2=A0 =C2=A0 =C2=A0 | 1173 ++++++++++++++= ++++++++++++++++++++++++
=C2=A0sys/fs/tarfs/tarfs_vnops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 642 ++++= +++++++++++++++++
=C2=A0sys/kern/subr_witness.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2= =A0 6 +
=C2=A0sys/modules/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0|= =C2=A0 =C2=A0 1 +
=C2=A0sys/modules/tarfs/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 =C2=A023= +
=C2=A0tests/sys/fs/Makefile=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 |=C2= =A0 =C2=A0 1 +
=C2=A0tests/sys/fs/tarfs/Makefile=C2=A0 =C2=A0 =C2=A0 |=C2=A0 =C2=A010 + =C2=A0tests/sys/fs/tarfs/mktar.c=C2=A0 =C2=A0 =C2=A0 =C2=A0|=C2=A0 238 ++++= ++++
=C2=A0tests/sys/fs/tarfs/tarfs_test.sh |=C2=A0 =C2=A054 ++
=C2=A018 files changed, 3911 insertions(+)

diff --git a/etc/mtree/BSD.tests.dist b/etc/mtree/BSD.tests.dist
index 0d05ecaf06fc..b4b18997b7f9 100644
--- a/etc/mtree/BSD.tests.dist
+++ b/etc/mtree/BSD.tests.dist
@@ -757,6 +757,8 @@
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fs
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0fusefs
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0..
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 tarfs
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ..
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0tmpfs
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0..
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0..
diff --git a/share/man/man5/Makefile b/share/man/man5/Makefile
index 2d49d981c2f9..f6e91e4ed00b 100644
--- a/share/man/man5/Makefile
+++ b/share/man/man5/Makefile
@@ -70,6 +70,7 @@ MAN=3D=C2=A0 acct.5 \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 style.Makefile.5 \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 style.mdoc.5 \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 sysctl.conf.5 \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs.5 \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 tmpfs.5 \
=C2=A0 =C2=A0 =C2=A0 =C2=A0 unionfs.5

diff --git a/share/man/man5/tarfs.5 b/share/man/man5/tarfs.5
new file mode 100644
index 000000000000..b25131c323c1
--- /dev/null
+++ b/share/man/man5/tarfs.5
@@ -0,0 +1,103 @@
+.\"-
+.\" SPDX-License-Identifier: BSD-2-Clause
+.\"
+.\" Copyright (c) 2022 Klara, Inc.
+.\"
+.\" Redistribution and use in source and binary forms, with or withou= t
+.\" modification, are permitted provided that the following condition= s
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright=
+.\"=C2=A0 =C2=A0 notice, this list of conditions and the following di= sclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyri= ght
+.\"=C2=A0 =C2=A0 notice, this list of conditions and the following di= sclaimer in the
+.\"=C2=A0 =C2=A0 documentation and/or other materials provided with t= he distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS&= #39;' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,= THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULA= R PURPOSE
+.\" ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTOR= S BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONS= EQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE= GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPT= ION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRAC= T, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN= ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILI= TY OF
+.\" SUCH DAMAGE.
+.\"
+.Dd February 2, 2023
+.Dt TARFS 5
+.Os
+.Sh NAME
+.Nm tarfs
+.Nd tarball filesystem
+.Sh SYNOPSIS
+To compile this driver into the kernel, place the following line in
+your kernel configuration file:
+.Bd -ragged -offset indent
+.Cd "options TARFS"
+.Ed
+.Pp
+Alternatively, to load the driver as a module at boot time, place the
+following line in
+.Xr loader.conf 5 :
+.Bd -literal -offset indent
+tarfs_load=3D"YES"
+.Ed
+.Sh DESCRIPTION
+The
+.Nm
+driver implementes a read-only filesystem backed by a
+.Xr tar 5
+file.
+Currently, only POSIX archives, optionally compressed with
+.Xr zstd 1 ,
+are supported.
+.Pp
+The preferred I/O size for
+.Nm
+filesystems can be adjusted using the
+.Va vfs.tarfs.ioshift
+sysctl setting and tunable.
+Setting it to 0 will reset it to its default value.
+Note that changes to this setting only apply to filesystems mounted
+after the change.
+.Sh DIAGNOSTICS
+If enabled by the
+.Dv TARFS_DEBUG
+kernel option, the
+.Va vfs.tarfs.debug
+sysctl setting can be used to control debugging output from the
+.Nm
+driver.
+Debugging output for individual sections of the driver can be enabled
+by adding together the relevant values from the table below.
+.Bl -column Value Description
+.It 0x01 Ta Memory allocations
+.It 0x02 Ta Checksum calculations
+.It 0x04 Ta Filesystem operations (vfsops)
+.It 0x08 Ta Path lookups
+.It 0x10 Ta File operations (vnops)
+.It 0x20 Ta General I/O
+.It 0x40 Ta Decompression
+.It 0x80 Ta Decompression index
+.It 0x100 Ta Sparse file mapping
+.El
+.Sh SEE ALSO
+.Xr tar 1 ,
+.Xr zstd 1 ,
+.Xr fstab 5 ,
+.Xr tar 5 ,
+.Xr mount 8 ,
+.Xr sysctl 8
+.Sh HISTORY
+.An -nosplit
+The
+.Nm
+driver was developed by
+.An Stephen J. Kiernan Aq Mt stevek@FreeBSD.org
+and
+.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org
+for Juniper Networks and Klara Systems.
+This manual page was written by
+.An Dag-Erling Sm=C3=B8rgrav Aq Mt des@FreeBSD.org
+for Juniper Networks and Klara Systems.
diff --git a/sys/conf/files b/sys/conf/files
index 6cb4abcd9223..08966a9b46e4 100644
--- a/sys/conf/files
+++ b/sys/conf/files
@@ -3615,6 +3615,10 @@ fs/smbfs/smbfs_smb.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0optional smbfs
=C2=A0fs/smbfs/smbfs_subr.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional smbf= s
=C2=A0fs/smbfs/smbfs_vfsops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 optional smbfs
=C2=A0fs/smbfs/smbfs_vnops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0optional smbf= s
+fs/tarfs/tarfs_io.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional tarf= s compile-with "${NORMAL_C} -I$S/contrib/zstd/lib/freebsd"
+fs/tarfs/tarfs_subr.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional tarfs
+fs/tarfs/tarfs_vfsops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 optional tarfs
+fs/tarfs/tarfs_vnops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0optional tarfs
=C2=A0fs/udf/osta.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 optional udf
=C2=A0fs/udf/udf_iconv.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0opt= ional udf_iconv
=C2=A0fs/udf/udf_vfsops.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 optional= udf
diff --git a/sys/conf/options b/sys/conf/options
index 1f5003507539..3b2be66ba602 100644
--- a/sys/conf/options
+++ b/sys/conf/options
@@ -265,6 +265,7 @@ NULLFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = opt_dontuse.h
=C2=A0PROCFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0opt_dontuse.h
=C2=A0PSEUDOFS=C2=A0 =C2=A0 =C2=A0 =C2=A0opt_dontuse.h
=C2=A0SMBFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h
+TARFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h
=C2=A0TMPFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h
=C2=A0UDF=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_dontuse.h
=C2=A0UNIONFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_do= ntuse.h
@@ -273,6 +274,9 @@ ZFS=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0opt_dontuse.h
=C2=A0# Pseudofs debugging
=C2=A0PSEUDOFS_TRACE opt_pseudofs.h

+# Tarfs debugging
+TARFS_DEBUG=C2=A0 =C2=A0 opt_tarfs.h
+
=C2=A0# In-kernel GSS-API
=C2=A0KGSSAPI=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 opt_kg= ssapi.h
=C2=A0KGSSAPI_DEBUG=C2=A0 opt_kgssapi.h
diff --git a/sys/fs/tarfs/tarfs.h b/sys/fs/tarfs/tarfs.h
new file mode 100644
index 000000000000..dffd60ee6d8a
--- /dev/null
+++ b/sys/fs/tarfs/tarfs.h
@@ -0,0 +1,254 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright (c) 2013 Juniper Networks, Inc.
+ * Copyright (c) 2022-2023 Klara, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim= er.
+ * 2. Redistributions in binary form must reproduce the above copyright + *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim= er in the
+ *=C2=A0 =C2=A0 documentation and/or other materials provided with the dis= tribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'&#= 39; AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP= OSE
+ * ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE L= IABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT= IAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR= ICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W= AY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF<= br> + * SUCH DAMAGE.
+ */
+
+#ifndef=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_H_
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_H_
+
+#ifndef _KERNEL
+#error Should only be included by kernel
+#endif
+
+MALLOC_DECLARE(M_TARFSMNT);
+MALLOC_DECLARE(M_TARFSNODE);
+MALLOC_DECLARE(M_TARFSNAME);
+
+#ifdef SYSCTL_DECL
+SYSCTL_DECL(_vfs_tarfs);
+#endif
+
+struct componentname;
+struct mount;
+struct vnode;
+
+/*
+ * Internal representation of a tarfs file system node.
+ */
+struct tarfs_node {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TAILQ_ENTRY(tarfs_node) entries;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TAILQ_ENTRY(tarfs_node) dirents;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct mtx=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0lock;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 *vnode;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount=C2=A0 =C2=A0 =C2=A0 *tmp; +=C2=A0 =C2=A0 =C2=A0 =C2=A0enum vtype=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0type;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ino_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 ino;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 offset;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0size;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0physize;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0char=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 *name;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0namelen;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Node attributes */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0uid_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 uid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0gid_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 gid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mode_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0mode;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0flags;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0nlink_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 nlink;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 atime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 mtime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 ctime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct timespec=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 birthtime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 gen;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* Block map */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0nblk;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_blk=C2=A0 =C2=A0 =C2=A0 =C2=A0 *bl= k;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node=C2=A0 =C2=A0 =C2=A0 =C2=A0*pa= rent;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0union {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VDIR */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0TAILQ_HEAD(, tarfs_node) dirhead;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 lastcookie;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0struct tarfs_node=C2=A0 =C2=A0 =C2=A0 =C2=A0*lastnode;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} dir;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VLNK */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0char=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 *name;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0namelen;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0} link;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VBLK or VCHR */<= br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dev_t=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 rdev;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* VREG */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node= =C2=A0 =C2=A0 =C2=A0 =C2=A0*other;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0};
+};
+
+/*
+ * Entry in sparse file block map.
+ */
+struct tarfs_blk {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 i;=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0/* input (physical) offset */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 o;=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0/* output (logical) offset */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0l;=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0/* length */
+};
+
+/*
+ * Decompression buffer.
+ */
+#define TARFS_ZBUF_SIZE 1048576
+struct tarfs_zbuf {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0u_char=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= buf[TARFS_ZBUF_SIZE];
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= off; /* offset of contents */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= len; /* length of contents */
+};
+
+/*
+ * Internal representation of a tarfs mount point.
+ */
+struct tarfs_mount {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TAILQ_HEAD(, tarfs_node) allnodes;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct mtx=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0allnode_lock;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node=C2=A0 =C2=A0 =C2=A0 =C2=A0*ro= ot;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 *vp;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct mount=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 *vfs;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ino_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 ino;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct unrhdr=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0*ino_unr;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0iosize;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0nblocks;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0nfiles;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0time_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0mtime; /* default mtime for directories */
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zio=C2=A0 =C2=A0 =C2=A0 =C2=A0 *zi= o;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 *znode;
+};
+
+struct tarfs_zio {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount=C2=A0 =C2=A0 =C2=A0 *tmp; +
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* decompression state */
+#ifdef ZSTDIO
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zstd=C2=A0 =C2=A0 =C2=A0 =C2=A0*zs= td; /* decompression state (zstd) */
+#endif
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 ipos; /* current input position */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 opos; /* current output position */
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* index of compression frames */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0curidx; /* current index position*/
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0nidx; /* number of index entries */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0szidx; /* index capacity */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_idx { off_t i, o; } *idx;
+};
+
+struct tarfs_fid {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0u_short=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 len;=C2=A0 =C2=A0/* length of data in bytes */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0u_short=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 data0; /* force alignment */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ino_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 ino;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned long=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 gen;
+};
+
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_NODE_LOCK(tnp) \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_lock(&(tnp)->lock)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_NODE_UNLOCK(tnp) \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&(tnp)->lock)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ALLNODES_LOCK(tnp) \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_lock(&(tmp)->allnode_lock)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ALLNODES_UNLOCK(tnp) \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0mtx_unlock(&(tmp)->allnode_lock)
+
+/*
+ * Data and metadata within tar files are aligned on 512-byte boundaries,<= br> + * to match the block size of the magnetic tapes they were originally
+ * intended for.
+ */
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BSHIFT=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 9
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BLOCKSIZE=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0(size_t)(1U << TARFS_BSHIFT)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BLKOFF(l)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0((l) % TARFS_BLOCKSIZE)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_BLKNUM(l)=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0((l) >> TARFS_BSHIFT)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_SZ2BLKS(sz)=C2=A0 =C2=A0 =C2=A0 = =C2=A0(((sz) + TARFS_BLOCKSIZE - 1) / TARFS_BLOCKSIZE)
+
+/*
+ * Our preferred I/O size.
+ */
+extern unsigned int tarfs_ioshift;
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_IOSHIFT_MIN=C2=A0 =C2=A0 =C2=A0 = =C2=A0TARFS_BSHIFT
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_IOSHIFT_DEFAULT=C2=A0 =C2=A0PAGE_= SHIFT
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_IOSHIFT_MAX=C2=A0 =C2=A0 =C2=A0 = =C2=A0PAGE_SHIFT
+
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ROOTINO=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0((ino_t)3)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ZIOINO=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 ((ino_t)4)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_MININO=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 ((ino_t)65535)
+
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_COOKIE_DOT=C2=A0 =C2=A0 =C2=A0 = =C2=A0 0
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_COOKIE_DOTDOT=C2=A0 =C2=A0 =C2=A0= 1
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_COOKIE_EOF=C2=A0 =C2=A0 =C2=A0 = =C2=A0 OFF_MAX
+
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ZIO_NAME=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 ".tar"
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_ZIO_NAMELEN=C2=A0 =C2=A0 =C2=A0 = =C2=A0(sizeof(TARFS_ZIO_NAME) - 1)
+
+extern struct vop_vector tarfs_vnodeops;
+
+static inline
+struct tarfs_mount *
+MP_TO_TARFS_MOUNT(struct mount *mp)
+{
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(mp !=3D NULL && mp->mnt_data != =3D NULL);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (mp->mnt_data);
+}
+
+static inline
+struct tarfs_node *
+VP_TO_TARFS_NODE(struct vnode *vp)
+{
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(vp !=3D NULL && vp->v_data != =3D NULL);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (vp->v_data);
+}
+
+int=C2=A0 =C2=A0 tarfs_alloc_node(struct tarfs_mount *tmp, const char *nam= e,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0size_t namelen, enum vtype type, = off_t off, size_t sz,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0time_t mtime, uid_t uid, gid_t gi= d, mode_t mode,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int flags, const char *l= inkname, dev_t rdev,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_node *parent, struct= tarfs_node **node);
+int=C2=A0 =C2=A0 tarfs_load_blockmap(struct tarfs_node *tnp, size_t realsi= ze);
+void=C2=A0 =C2=A0tarfs_dump_tree(struct tarfs_node *tnp);
+void=C2=A0 =C2=A0tarfs_free_node(struct tarfs_node *tnp);
+struct tarfs_node *
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_lookup_dir(struct tarfs_node *tnp, off_t = cookie);
+struct tarfs_node *
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_lookup_node(struct tarfs_node *tnp, struc= t tarfs_node *f,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0struct componentname *cnp);
+void=C2=A0 =C2=A0tarfs_print_node(struct tarfs_node *tnp);
+int=C2=A0 =C2=A0 tarfs_read_file(struct tarfs_node *tnp, size_t len, struc= t uio *uiop);
+
+int=C2=A0 =C2=A0 tarfs_io_init(struct tarfs_mount *tmp);
+int=C2=A0 =C2=A0 tarfs_io_fini(struct tarfs_mount *tmp);
+int=C2=A0 =C2=A0 tarfs_io_read(struct tarfs_mount *tmp, bool raw,
+=C2=A0 =C2=A0 struct uio *uiop);
+ssize_t=C2=A0 =C2=A0 =C2=A0 =C2=A0 tarfs_io_read_buf(struct tarfs_mount *t= mp, bool raw,
+=C2=A0 =C2=A0 void *buf, off_t off, size_t len);
+unsigned int
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_strtofflags(const char *str, char **end);=
+
+#endif /* _FS_TARFS_TARFS_H_ */
diff --git a/sys/fs/tarfs/tarfs_dbg.h b/sys/fs/tarfs/tarfs_dbg.h
new file mode 100644
index 000000000000..45d11d679719
--- /dev/null
+++ b/sys/fs/tarfs/tarfs_dbg.h
@@ -0,0 +1,65 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright (c) 2013 Juniper Networks, Inc.
+ * Copyright (c) 2022 Klara, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim= er.
+ * 2. Redistributions in binary form must reproduce the above copyright + *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim= er in the
+ *=C2=A0 =C2=A0 documentation and/or other materials provided with the dis= tribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'&#= 39; AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP= OSE
+ * ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE L= IABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT= IAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR= ICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W= AY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF<= br> + * SUCH DAMAGE.
+ */
+
+#ifndef=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_DBG_H_
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 _FS_TARFS_TARFS_DBG_H_
+
+#ifndef _KERNEL
+#error Should only be included by kernel
+#endif
+
+#ifdef TARFS_DEBUG
+extern int tarfs_debug;
+
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_ALLOC=C2=A0 =C2=A0 =C2=A0 = =C2=A00x01
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_CHECKSUM=C2=A0 =C2=A0 0x02<= br> +#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_FS=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0x04
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_LOOKUP=C2=A0 =C2=A0 =C2=A0 = 0x08
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_VNODE=C2=A0 =C2=A0 =C2=A0 = =C2=A00x10
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_IO=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 0x20
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_ZIO=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00x40
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_ZIDX=C2=A0 =C2=A0 =C2=A0 = =C2=A0 0x80
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DEBUG_MAP=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A00x100
+
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF(category, fmt, ...)=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0\
+=C2=A0 =C2=A0 =C2=A0 =C2=A0do {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((tarfs_debug &a= mp; TARFS_DEBUG_##category) !=3D 0)=C2=A0 =C2=A0 =C2=A0 =C2=A0 \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0printf(fmt, ## __VA_ARGS__);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} while (0)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF_IFF(category, cond, fmt, ...)= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0\
+=C2=A0 =C2=A0 =C2=A0 =C2=A0do {=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((cond)=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&= & (tarfs_debug & TARFS_DEBUG_##category) !=3D 0)=C2=A0 =C2=A0 =C2= =A0\
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0printf(fmt, ## __VA_ARGS__);=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} while (0)
+#else
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF(category, fmt, ...)
+#define=C2=A0 =C2=A0 =C2=A0 =C2=A0 TARFS_DPF_IFF(category, cond, fmt, ...)=
+#endif
+
+#endif /* _FS_TARFS_TARFS_DBG_H_ */
diff --git a/sys/fs/tarfs/tarfs_io.c b/sys/fs/tarfs/tarfs_io.c
new file mode 100644
index 000000000000..b957ac11ff51
--- /dev/null
+++ b/sys/fs/tarfs/tarfs_io.c
@@ -0,0 +1,727 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright (c) 2013 Juniper Networks, Inc.
+ * Copyright (c) 2022-2023 Klara, Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim= er.
+ * 2. Redistributions in binary form must reproduce the above copyright + *=C2=A0 =C2=A0 notice, this list of conditions and the following disclaim= er in the
+ *=C2=A0 =C2=A0 documentation and/or other materials provided with the dis= tribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'&#= 39; AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURP= OSE
+ * ARE DISCLAIMED.=C2=A0 IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE L= IABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENT= IAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS=
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STR= ICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY W= AY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF<= br> + * SUCH DAMAGE.
+ */
+
+#include "opt_tarfs.h"
+#include "opt_zstdio.h"
+
+#include <sys/param.h>
+#include <sys/systm.h>
+#include <sys/counter.h>
+#include <sys/bio.h>
+#include <sys/buf.h>
+#include <sys/malloc.h>
+#include <sys/mount.h>
+#include <sys/sysctl.h>
+#include <sys/uio.h>
+#include <sys/vnode.h>
+
+#ifdef ZSTDIO
+#define ZSTD_STATIC_LINKING_ONLY
+#include <contrib/zstd/lib/zstd.h>
+#endif
+
+#include <fs/tarfs/tarfs.h>
+#include <fs/tarfs/tarfs_dbg.h>
+
+#ifdef TARFS_DEBUG
+SYSCTL_NODE(_vfs_tarfs, OID_AUTO, zio, CTLFLAG_RD, 0,
+=C2=A0 =C2=A0 "Tar filesystem decompression layer");
+COUNTER_U64_DEFINE_EARLY(tarfs_zio_inflated);
+SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, inflated, CTLFLAG_RD,
+=C2=A0 =C2=A0 &tarfs_zio_inflated, "Amount of compressed data inf= lated.");
+COUNTER_U64_DEFINE_EARLY(tarfs_zio_consumed);
+SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, consumed, CTLFLAG_RD,
+=C2=A0 =C2=A0 &tarfs_zio_consumed, "Amount of compressed data con= sumed.");
+COUNTER_U64_DEFINE_EARLY(tarfs_zio_bounced);
+SYSCTL_COUNTER_U64(_vfs_tarfs_zio, OID_AUTO, bounced, CTLFLAG_RD,
+=C2=A0 =C2=A0 &tarfs_zio_bounced, "Amount of decompressed data bo= unced.");
+
+static int
+tarfs_sysctl_handle_zio_reset(SYSCTL_HANDLER_ARGS)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int tmp;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tmp =3D 0;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if ((error =3D SYSCTL_OUT(req, &tmp, sizeof= (tmp))) !=3D 0)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (req->newptr !=3D NULL) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if ((error =3D SYSC= TL_IN(req, &tmp, sizeof(tmp))) !=3D 0)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0return (error);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0counter_u64_zero(ta= rfs_zio_inflated);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0counter_u64_zero(ta= rfs_zio_consumed);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0counter_u64_zero(ta= rfs_zio_bounced);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (0);
+}
+
+SYSCTL_PROC(_vfs_tarfs_zio, OID_AUTO, reset,
+=C2=A0 =C2=A0 CTLTYPE_INT | CTLFLAG_MPSAFE | CTLFLAG_RW,
+=C2=A0 =C2=A0 NULL, 0, tarfs_sysctl_handle_zio_reset, "IU",
+=C2=A0 =C2=A0 "Reset compression counters.");
+#endif
+
+MALLOC_DEFINE(M_TARFSZSTATE, "tarfs zstate", "tarfs decompr= ession state");
+MALLOC_DEFINE(M_TARFSZBUF, "tarfs zbuf", "tarfs decompressi= on buffers");
+
+#define XZ_MAGIC=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(ui= nt8_t[]){ 0xfd, 0x37, 0x7a, 0x58, 0x5a }
+#define ZLIB_MAGIC=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(uint8_t= []){ 0x1f, 0x8b, 0x08 }
+#define ZSTD_MAGIC=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(uint8_t= []){ 0x28, 0xb5, 0x2f, 0xfd }
+
+#ifdef ZSTDIO
+struct tarfs_zstd {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_DStream *zds;
+};
+#endif
+
+/* XXX review use of curthread / uio_td / td_cred */
+
+/*
+ * Reads from the tar file according to the provided uio.=C2=A0 If the arc= hive
+ * is compressed and raw is false, reads the decompressed stream;
+ * otherwise, reads directly from the original file.=C2=A0 Returns 0 on su= ccess
+ * and a positive errno value on failure.
+ */
+int
+tarfs_io_read(struct tarfs_mount *tmp, bool raw, struct uio *uiop)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0void *rl =3D NULL;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t off =3D uiop->uio_offset;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t len =3D uiop->uio_resid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (raw || tmp->znode =3D=3D NULL) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0rl =3D vn_rangelock= _rlock(tmp->vp, off, off + len);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(t= mp->vp, LK_SHARED);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)= {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0error =3D VOP_READ(tmp->vp, uiop,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0IO_DIRECT|IO_NODELOCKED,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0uiop->uio_td->td_ucred);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0VOP_UNLOCK(tmp->vp);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0vn_rangelock_unlock= (tmp->vp, rl);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(t= mp->znode, LK_EXCLUSIVE);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)= {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0error =3D VOP_READ(tmp->znode, uiop,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0IO_DIRECT | IO_NODELOCKED,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0uiop->uio_td->td_ucred);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0VOP_UNLOCK(tmp->znode);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, "%s(%zu, %zu) =3D %d (resid = %zd)\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size_t)off, len, error, uiop->= ;uio_resid);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);
+}
+
+/*
+ * Reads from the tar file into the provided buffer.=C2=A0 If the archive = is
+ * compressed and raw is false, reads the decompressed stream; otherwise,<= br> + * reads directly from the original file.=C2=A0 Returns the number of byte= s
+ * read on success, 0 on EOF, and a negative errno value on failure.
+ */
+ssize_t
+tarfs_io_read_buf(struct tarfs_mount *tmp, bool raw,
+=C2=A0 =C2=A0 void *buf, off_t off, size_t len)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct uio auio;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct iovec aiov;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ssize_t res;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (len =3D=3D 0) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, "= ;%s(%zu, %zu) null\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size= _t)off, len);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (0);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0aiov.iov_base =3D buf;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0aiov.iov_len =3D len;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_iov =3D &aiov;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_iovcnt =3D 1;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_offset =3D off;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_segflg =3D UIO_SYSSPACE;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_rw =3D UIO_READ;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_resid =3D len;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0auio.uio_td =3D curthread;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D tarfs_io_read(tmp, raw, &auio); +=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error !=3D 0) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, "= ;%s(%zu, %zu) error %d\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size= _t)off, len, error);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0return (-error); +=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0res =3D len - auio.uio_resid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (res =3D=3D 0 && len !=3D 0) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, "= ;%s(%zu, %zu) eof\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size= _t)off, len);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(IO, "= ;%s(%zu, %zu) read %zd | %*D\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size= _t)off, len, res,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(int)= (res > 8 ? 8 : res), (uint8_t *)buf, " ");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (res);
+}
+
+#ifdef ZSTDIO
+static void *
+tarfs_zstate_alloc(void *opaque, size_t size)
+{
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0(void)opaque;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (malloc(size, M_TARFSZSTATE, M_WAITOK));=
+}
+#endif
+
+#ifdef ZSTDIO
+static void
+tarfs_zstate_free(void *opaque, void *address)
+{
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0(void)opaque;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0free(address, M_TARFSZSTATE);
+}
+#endif
+
+#ifdef ZSTDIO
+static ZSTD_customMem tarfs_zstd_mem =3D {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_zstate_alloc,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0tarfs_zstate_free,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0NULL,
+};
+#endif
+
+/*
+ * Updates the decompression frame index, recording the current input and<= br> + * output offsets in a new index entry, and growing the index if
+ * necessary.
+ */
+static void
+tarfs_zio_update_index(struct tarfs_zio *zio, off_t i, off_t o)
+{
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (++zio->curidx >=3D zio->nidx) { +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (++zio->nidx = > zio->szidx) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zio->szidx *=3D 2;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zio->idx =3D realloc(zio->idx,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0zio->szidx * sizeof(*zio->idx),
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0M_TARFSZSTATE, M_ZERO | M_WAITOK);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0TARFS_DPF(ALLOC, "%s: resized zio index\n", __func__);<= br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio->idx[zio->= ;curidx].i =3D i;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio->idx[zio->= ;curidx].o =3D o;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIDX, &qu= ot;%s: index %u =3D i %zu o %zu\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&= gt;curidx, (size_t)zio->idx[zio->curidx].i,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size= _t)zio->idx[zio->curidx].o);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zio->idx[zio->curidx].i =3D=3D i);<= br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zio->idx[zio->curidx].o =3D=3D o);<= br> +}
+
+/*
+ * VOP_ACCESS for zio node.
+ */
+static int
+tarfs_zaccess(struct vop_access_args *ap)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode *vp =3D ap->a_vp;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zio *zio =3D vp->v_data;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount *tmp =3D zio->tmp;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0accmode_t accmode =3D ap->a_accmode;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error =3D EPERM;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (accmode =3D=3D VREAD) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(t= mp->vp, LK_SHARED);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)= {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0error =3D VOP_ACCESS(tmp->vp, accmode, ap->a_cred, ap->a= _td);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0VOP_UNLOCK(tmp->vp);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIO, "%s(%d) =3D %d\n", __f= unc__, accmode, error);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);
+}
+
+/*
+ * VOP_GETATTR for zio node.
+ */
+static int
+tarfs_zgetattr(struct vop_getattr_args *ap)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vattr va;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vnode *vp =3D ap->a_vp;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zio *zio =3D vp->v_data;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount *tmp =3D zio->tmp;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct vattr *vap =3D ap->a_vap;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error =3D 0;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0VATTR_NULL(vap);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(tmp->vp, LK_SHARED);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D VOP_GETAT= TR(tmp->vp, &va, ap->a_cred);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0VOP_UNLOCK(tmp->= vp);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (error =3D=3D 0)= {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_type =3D VREG;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_mode =3D va.va_mode;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_nlink =3D 1;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_gid =3D va.va_gid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_uid =3D va.va_uid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_fsid =3D vp->v_mount->mnt_stat.f_fsid.val[0]; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_fileid =3D TARFS_ZIOINO;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_size =3D zio->idx[zio->nidx - 1].o;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_blocksize =3D vp->v_mount->mnt_stat.f_iosize; +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_atime =3D va.va_atime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_ctime =3D va.va_ctime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_mtime =3D va.va_mtime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_birthtime =3D tmp->root->birthtime;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0vap->va_bytes =3D va.va_bytes;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIO, "%s() =3D %d\n", __fun= c__, error);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0return (error);
+}
+
+#ifdef ZSTDIO
+/*
+ * VOP_READ for zio node, zstd edition.
+ */
+static int
+tarfs_zread_zstd(struct tarfs_zio *zio, struct uio *uiop)
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0void *ibuf =3D NULL, *obuf =3D NULL, *rl =3D NU= LL;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct uio auio;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct iovec aiov;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_mount *tmp =3D zio->tmp;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct tarfs_zstd *zstd =3D zio->zstd;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0struct thread *td =3D curthread;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_inBuffer zib;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_outBuffer zob;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t zsize;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t ipos, opos;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t ilen, olen;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t zerror;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0off_t off =3D uiop->uio_offset;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t len =3D uiop->uio_resid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t resid =3D uiop->uio_resid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0size_t bsize;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0int error;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0bool reset =3D false;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* do we have to rewind? */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (off < zio->opos) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while (zio->curi= dx > 0 && off < zio->idx[zio->curidx].o)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zio->curidx--;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0reset =3D true;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* advance to the nearest index entry */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (off > zio->opos) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0// XXX maybe do a b= inary search instead
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while (zio->curi= dx < zio->nidx - 1 &&
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0off &= gt;=3D zio->idx[zio->curidx + 1].o) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zio->curidx++;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0reset =3D true;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* reset the decompression stream if needed */<= br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0if (reset) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio->ipos =3D zi= o->idx[zio->curidx].i;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio->opos =3D zi= o->idx[zio->curidx].o;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ZSTD_resetDStream(z= std->zds);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIDX, &qu= ot;%s: skipping to index %u =3D i %zu o %zu\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zio-&= gt;curidx, (size_t)zio->ipos, (size_t)zio->opos);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ZIDX, &qu= ot;%s: continuing at i %zu o %zu\n", __func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(size= _t)zio->ipos, (size_t)zio->opos);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * Set up a temporary buffer for compressed dat= a.=C2=A0 Use the size
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * recommended by the zstd library; this is usu= ally 128 kB, but
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * just in case, make sure it's a multiple = of the page size and no
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * larger than MAXBSIZE.
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0bsize =3D roundup(ZSTD_CStreamOutSize(), PAGE_S= IZE);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (bsize > MAXBSIZE)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0bsize =3D MAXBSIZE;=
+=C2=A0 =C2=A0 =C2=A0 =C2=A0ibuf =3D malloc(bsize, M_TEMP, M_WAITOK);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zib.src =3D NULL;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zib.size =3D 0;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zib.pos =3D 0;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/*
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * Set up the decompression buffer.=C2=A0 If th= e target is not in
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * kernel space, we will have to set up a bounc= e buffer.
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 *
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * TODO: to avoid using a bounce buffer, map de= stination pages
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 * using vm_fault_quick_hold_pages().
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zio->opos <=3D off);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(uiop->uio_iovcnt =3D=3D 1);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(uiop->uio_iov->iov_len >=3D len)= ;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (uiop->uio_segflg =3D=3D UIO_SYSSPACE) {<= br> +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zob.dst =3D uiop-&g= t;uio_iov->iov_base;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0} else {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0TARFS_DPF(ALLOC, &q= uot;%s: allocating %zu-byte bounce buffer\n",
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0__fun= c__, len);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0zob.dst =3D obuf = =3D malloc(len, M_TEMP, M_WAITOK);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zob.size =3D len;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0zob.pos =3D 0;
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* lock tarball */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0rl =3D vn_rangelock_rlock(tmp->vp, zio->i= pos, OFF_MAX);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_lock(tmp->vp, LK_SHARED);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error !=3D 0) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail_unlocked;=
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0/* check size */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0error =3D vn_getsize_locked(tmp->vp, &zs= ize, td->td_ucred);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (error !=3D 0) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (zio->ipos >=3D zsize) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* beyond EOF */ +=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0}
+
+=C2=A0 =C2=A0 =C2=A0 =C2=A0while (resid > 0) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (zib.pos =3D=3D = zib.size) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0/* request data from the underlying file */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0aiov.iov_base =3D ibuf;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0aiov.iov_len =3D bsize;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0auio.uio_iov =3D &aiov;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0auio.uio_iovcnt =3D 1;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0auio.uio_offset =3D zio->ipos;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0auio.uio_segflg =3D UIO_SYSSPACE;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0auio.uio_rw =3D UIO_READ;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0auio.uio_resid =3D aiov.iov_len;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0auio.uio_td =3D td;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0error =3D VOP_READ(tmp->vp, &auio,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0IO_DIRECT | IO_NODELOCKED,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0td->td_ucred);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0if (error !=3D 0)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto fail;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0TARFS_DPF(ZIO, "%s: req %zu+%zu got %zu+%zu\n", __func_= _,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(size_t)zio->ipos, bsize,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(size_t)zio->ipos, bsize - auio.uio_resid);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zib.src =3D ibuf;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zib.size =3D bsize - auio.uio_resid;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zib.pos =3D 0;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0MPASS(zib.pos <= =3D zib.size);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (zib.pos =3D=3D = zib.size) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0TARFS_DPF(ZIO, "%s: end of file after i %zu o %zu\n", _= _func__,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0(size_t)zio->ipos, (size_t)zio->opos);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0goto fail;
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (zio->opos &l= t; off) {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0/* to be discarded */
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zob.size =3D min(off - zio->opos, len);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0zob.pos =3D 0;
*** 3111 LINES SKIPPED ***

--000000000000931c1e05f3cfae44--