RFC: root mount enhancement (round 2)

Wed Aug 25 23:47:57 UTC 2010

On Aug 25, 2010, at 3:36 PM, M. Warner Losh wrote:

> Hey Marcel,
> 
> The more I talk about this, the more that I think it might be useful
> in some ways.

Ok. I'll start prototyping something so that we can see if
it can live up to its promise or not.

> : : I prefer mounting the root file system as soon as the device appears
> : and enhance the fstab mounting to deal with the device not being
> : there yet.
> : 
> : Consequently: the bug is with root_mount_hold() and root_mount_rel()
> : as a means to do the right thing...
> 
> We don't need to enhance fstab to cope with / not being there.

I'm sorry. I worded it too sloppily. The enhancement relates to all
other file systems that we want to mount during boot, but which may
not have been discovered yet. If we proceed with the boot as soon
as we have the desired root file system, we create a serialization
problem downstream that we don't have when we wait for all devices
first.

> I've hacked mountroot() wait up to a given amount of time
> new devices to appear that contain the root file system before giving
> up.  That way, if you know you've got the root file system, you can go
> right away, but otherwise you do something more intelligent than
> 'nothing' or 'prompt' when it isn't there.  This meshes well with the
> .wait directive and your thinking too.  The part I didn't like about
> this was the arbitrary upper time limit on it.  I'd like to wait until
> *ALL* devices are done to fail and accept a '.wait 5' as an ugly
> alternative to knowing that all boot devices are there.

I may be able to implement this by changing the .wait directive to
take a flag:

	.wait <seconds> <for-what>

The <for-what> can be "next", meaning that we wait up to X seconds
until we give up on the next mountdirective. The <for-what> could
also be "all", which then slightly changes the meaning into the
max number of seconds to wait for "new device" events before trying
the next (or subsequent) mount directrives. Put differently: the
the number of seconds to idle and wait *after* the last device
arrival before trying the first or next mount. Every time a new
device is announced, you restart the clock.

Would this do what you described?

> I've also thought about having it drop to a prompt, but noticing that
> new devices show up.  You could automatically proceed, or at the very
> least be able to type the new device in once it is there.  This would
> let the normal boot proceed, kick you to the prompt if, say, the usb
> drive fell out and still let you plug it back in and have the system
> pick back up again.

Interesting. I like this. Let me see if this is doable without
inviting complexity.

> So, if your approach could have some hook for these types of
> enhancements (or used to implement them), that would be a compelling
> reason to support it.  Of course, it would still require knowing when
> you are done with your initial scans of the device tree, which is at
> present an unsolved problem....

Technically speaking we're never done. If I plug in a disk any
time after booting up, then we didn't wait long enough before
mounting root :-)

Seriously: hot plug implies that you can never truly wait for
all devices, because they can come and go during the entire
up time of the machine. Proceeding with the boot based on some
reasonable heuristics (i.e. nothing new was found in the last
X seconds, so it's unlikely we'll get a new disk) is probably
the best we can do....

> : >  But in that case, you're better off going through
> : > /boot/loader for this stuff, which leads me to my next question: Would
> : > any md device passed by the boot loader (or compiled into the kernel)
> : > would effectively be the second one and you'd not need any .md
> : > directives at all?
> : 
> : You can start off with a preloaded or compiled-in ramdisk, and then
> : recursively mount root, including from vnode-backed md devices, so
> : the .md directive is not rendered useless by preloading or compiling
> : in. You can even end the root mount recursion with the preloaded
> : ramdisk last -- this gives you premounted file systems under /.mount
> : without having to run /etc/rc (if you want to)...
> 
> Is the .md directive globally destructive, or just destructive to the
> local level of recursion?  If it is just the local level, how do you
> specify the unit number?  Maybe a better approach would be to
> encourage people to mount root based on how file systems are labelled,
> rather than what unit they happen to be taking up...  Would that help
> any here?

The .md directive (as envisioned so far) uses dynamic unit numbers
and is only locally destructive. This allows nested mounting of
root file systems that are all vnode-backed (don't ask me for a
real-life use case now :-)
The proposal uses '#' as the placeholder for the unit number. To be
precise: the '#' is literal and appears in the configuration file
to denote the md unit number created by the last .md directive. As
such, you don't actually need to know it.

Too klugy?
Too limited?

> : > 
> : > How is init handled at each stage?  forked after the last one, I assume?
> : 
> : No, init is only spawned after the root mount recursion ends. The .init
> : directive is there to override defaults. This is envisioned to be useful
> : for rescue images where you want to swawn /rescue/init or installation
> : images where you may want to spawn sysinstall. It eliminates having to
> : hardcode the possibilities in the kernel.
> 
> Right now through the boot loader you can set init_path, why would you
> need to add the ability to spawn a different one to the scripts?

No particular reason. I just tossed it in. If it's over the top, then
I'll remove it. It was just one of those ideas...

> : In a sense it gives you more freedom in how you want to call your initial
> : process without the pitfalls when the root mount recursion ends early due
> : to a problem.
> : 
> : As a concrete example, consider having a single file system on a writable
> : medium (say /dev/da0) and software images are ISO images stored in it.
> : You can install some recovery procedure on /dev/da0 that gets run when
> : none of the ISO images can be mounted. The ISO images have /sbin/init
> : as init as usual, but you can select to run /sbin/recovery from /dev/da0.
> : This allows for a single init executable that performs the right functions
> : based on the program name for example...
> 
> I think this is a bit convoluted an example.  The ISO images would
> fail to mount only if they were all damaged in a way that would make
> them unmountable, true?  If the backup ISO is AFU, then what's to say
> that /sbin/recovery isn't also AFU?  When would you need this?

The images could also have been, euh .. misplaced :-)
What to do when the ISO images aren't there? A panic may not be the
most user friendly response...

> I presume the default script would be something like (ignoring the
> hard coding of device names):
> 
> ufs:/dev/da0s1a
> .wait 5
> .onfail ask

Roughly. devfs will synthesize the .mount.conf contents based on tunables
and kernel options. The same options we now have hardcoded. Without
recursion this means that the root mount will not be any different from
what it is now.

> which would mount /dev/da0s1a when it became available, waiting up to
> 5 seconds and asking the user afterwards if that failed, right?

Yes, but I like the feedback I got from Matthew, who said that
the .wait applies to the mount directive following it. So the
.wait will precede the mount.

Also, the proposal as an .ask directive, rather than ask on
failure. I see asking as a mount directive of which the FS
and device are provided by the user.

-- 
Marcel Moolenaar
xcllnt at mac.com