Re: zfs support in makefs

From: Mark Johnston <markj_at_freebsd.org>
Date: Fri, 20 May 2022 17:35:56 UTC
On Fri, May 20, 2022 at 09:37:01PM +0900, Tomoaki AOKI wrote:
> On Thu, 19 May 2022 18:25:32 +0000
> Brooks Davis <brooks@freebsd.org> wrote:
> 
> > On Thu, May 19, 2022 at 01:36:25PM -0400, Allan Jude wrote:
> > > On 5/18/2022 7:04 PM, Brooks Davis wrote:
> > > > On Wed, May 18, 2022 at 03:03:17PM -0400, Mark Johnston wrote:
> > > >> Hi,
> > > >>
> > > >> For the past little while I've been working on ZFS support in makefs(8).
> > > >> At this point I'm able to create a bootable FreeBSD VM image, using the
> > > >> standard FreeBSD ZFS layout, and run through the regression test suite
> > > >> in bhyve.  I've also been able to create and boot an EC2 AMI.
> > > > 
> > > > Very cool!
> > > > 
> > > >> === Interface ===
> > > >>
> > > >> Creating a pool with a single dataset is easy:
> > > >>
> > > >> $ makefs -t zfs -s 10g -o poolname=test ./zfs.img /path/to/input
> > > >>
> > > >> Upon importing such a pool, you'll get a dataset named "test" mounted at
> > > >> /test containing everything under /path/to/input.
> > > >>
> > > >> It's possible to set properties on the root dataset:
> > > >>
> > > >> $ makefs -t zfs -s 10g -o poolname=test -o fs=test:setuid=off:atime=on ./zfs.img /path/to/input
> > > >>
> > > >> It's also possible to create additional datasets:
> > > >>
> > > >> $ makefs -t zfs -s 10g -o poolname=test -o fs=test/ds1:mountpoint=/test/dir1 ./zfs.img /path/to/input
> > > >>
> > > >> The parameter syntax is
> > > >> "-o fs=<dataset name>[:<prop1>=<val1>[:<prop2>=<val2>[:...]]]".  Only a
> > > >> few properties are supported, at least for now.
> > > >>
> > > >> Dataset mountpoints behave the same as they would if created with the
> > > >> standard ZFS tools.  So by default the root dataset's mountpoint is
> > > >> /test, test/ds1's mountpoint is /test/ds1, etc..  If a dataset overrides
> > > >> its default mountpoint, its children inherit that mountpoint.
> > > >>
> > > >> makefs builds the output filesystem using a single input directory tree.
> > > >> Thus, makefs -t zfs requires that at least one of the dataset's
> > > >> mountpoints map to /path/to/input; that is, there is a "root" mount
> > > >> point.
> > > >>
> > > >> The -o rootpath parameter defines this root mount point.  By default it's
> > > >> "/<poolname>".  All datasets in the pool must have their mountpoints
> > > >> under this path, and one dataset's mountpoint must be equal to this
> > > >> path.  To build bootable images, one sets -o rootpath=/.
> > > >>
> > > >> Putting it all together, one can build a image using the standard layout
> > > >> with an invocation like this:
> > > >>
> > > >> makefs -t zfs -o poolname=zroot -s 20g -o rootpath=/ -o bootfs=zroot/ROOT/default \
> > > >>      -o fs=zroot:canmount=off:mountpoint=none \
> > > >>      -o fs=zroot/ROOT:mountpoint=none \
> > > >>      -o fs=zroot/ROOT/default:mountpoint=/ \
> > > >>      -o fs=zroot/tmp:mountpoint=/tmp:exec=on:setuid=off \
> > > >>      -o fs=zroot/usr:mountpoint=/usr:canmount=off \
> > > >>      -o fs=zroot/usr/home \
> > > >>      -o fs=zroot/usr/ports:setuid=off \
> > > >>      -o fs=zroot/usr/src \
> > > >>      -o fs=zroot/usr/obj \
> > > >>      -o fs=zroot/var:mountpoint=/var:canmount=off \
> > > >>      -o fs=zroot/var/audit:setuid=off:exec=off \
> > > >>      -o fs=zroot/var/crash:setuid=off:exec=off \
> > > >>      -o fs=zroot/var/log:setuid=off:exec=off \
> > > >>      -o fs=zroot/var/mail:atime=on \
> > > >>      -o fs=zroot/var/tmp:setuid=off \
> > > >>      ${HOME}/tmp/zfs.img ${HOME}/tmp/world
> > > >>
> > > >> I'll admit this is somewhat clunky, but it doesn't seem worse than what
> > > >> we have to do otherwise, see poudriere-image for example:
> > > >> https://github.com/freebsd/poudriere/blob/master/src/share/poudriere/image_zfs.sh#L79
> > > >>
> > > >> What do folks think of this interface?  Is there anything missing, or
> > > >> anything that doesn't make sense?
> > > > 
> > > > I find it slightly confusing that -o options have a default namespace of
> > > > pool options unless they have an fs=*: prefix, but making users type
> > > > "pool:" for other options doesn't seem to make sense so this is probably
> > > > the best solution.
> > > > 
> > > > The density of data in the filesystem specification does suggest that
> > > > someone might want to create a UCL config file format eventually, but
> > > > what's here already seems entirely workable.
> > > > 
> > > > -- Brooks
> > > 
> > > In normal `zpool create` they use -o for pool properties, and -O for 
> > > dataset properties for the root dataset. I wonder if we might also want 
> > > -o poolprop=value and -O zroot/var:mountpoint=/var:canmount=off
> > > 
> > > just to avoid the conceptual collision of those 2 different items.
> > 
> > Sadly -O is taken in makefs.
> > 
> > > One other possible issue: dataset properties can have a : in them, for 
> > > user-defined properties. Do we maybe want to use a , to separate them 
> > > instead? Although values can contain ,'s (the sharenfs property often 
> > > does), so that probably doesn't work either.
> > 
> > One solution would be to allow the same fs=foo: to be specified multiple
> > times (I've not checked if the current code allows this) to add options
> > instead of having a separator.  That does make the command line even more
> > clunky though.
> > 
> > -- Brooks
> 
> Just an idea, what about moving partitioning (create pool)
> functionality to sbin/gpart, keeping relatively common functionality
> for datasets on /usr/sbin/makefs as primary proposal, and create,
> for example, /usr/sbin/makefs_zfs for complicated, ZFS-only
> functionalities.

I think splitting ZFS pool creation into a separate tool would introduce
some challenges; makefs would have to learn to read pool/vdev metadata
and respect whatever properties that are set.  Putting everything in one
tool is simpler.

gpart also doesn't seem like the right place, since one would typically
use mkimg(1) to build a GPT.

> It would look like gpart / mount / mount_* on other supported fs.
> And keeps common makefs simper.
> 
> IIRC, some fs-specific mount_* have extended functionality, that
> `mount -t (fstype)` does not support.

I like the idea of having a makefs_zfs since that would give us a new
option namespace.