login.conf.db, /sbin/init, separate /etc, and configs around "thin provisioning" WAS: Re: nuOS
Chad J. Milios
freebsd-list at nuos.org
Tue Jul 9 07:38:01 UTC 2013
On 07/08/13 22:12, Teske, Devin wrote:
>
> We also had to put one file into the etc directory on the / "beneath" the /etc mount so that /sbin/init can read it before /etc is mounted. There were two or three ways we could do that and each has a tradeoff.
>
> I've been bitten by that.
>
> Getting access to that file that's "beneath" once you've booted the system can be ... less than easy.
yeah i prefer resorting to trickery or "magic" as little as possible
only as a last resort and i try to clutter up the standard tree of files
as little as possible. in this case we only needed the one file, just a
symlink actually. the "under" has only the following:
lrwxr-xr-x 1 root wheel 25 Jun 25 17:59 /login.conf.db@ ->
../boot/etc/login.conf.db
and in the "over" /etc we still place an identical symlink so that the
real file is in /boot/etc/. cap_mkdb doesnt clobber the symlink, it
writes through to /boot/etc/login.conf.db for you. so in the normal
usual case, a user edits login.conf and runs cap_mkdb like they're
supposed to and everything is fine. its only if they rollback or restore
a backup to /etc that things potentially can end up out of sync.
i don't want anyone to get confused by me talking about jails in the
same email. The above snag we are working around involves /sbin/init
ONLY WHEN booting the host FreeBSD. Our jailed customers don't have to
worry about this because /etc is already in the right spot by the time
jail runs /etc/rc. /sbin/init isn't even involved in a jail, is it? Not
even in some "hooked-in" way? At any rate we dont have to do anything
special for a separate /etc dataset for jails.
We could just forgo the /etc dataset on the host but i am glad that we
can manage our bare metal customers using the same methods and tools.
Handling this symlink hack is less differentiation than giving up
separate /etc on the host i think.
>
> I'm interested in your cost/benefit points of having /etc a separate filesystem.
>
> On the face of it, I want to say that "/etc" is (or at least contains) the "core identity" of the machine (and to a lesser extent -- because this is BSD after-all -- /usr/local/etc). In my mind, /etc and /usr/local/etc *are* the machine (metaphorically speaking), so the merits of having it as a separate filesystem are weighed against your desired topology.
i agree. myself i like having such a lightweight "identity" and keeping
/, /usr and /usr/local (which are all just on sitting on / in my case)
mounted read-only. the "prototype" for a host is handled by a completely
different department than the people/customers who sysadmin their
deployments and instances. Early in the building/installing, before any
ports/packages, /usr/local/etc is made a symlink to /etc/local, so the
symlink is in the readonly / and every time you write or cd to
/usr/local/etc you end up in /etc/local. An /etc dataset ends up under a
MB zfs compressed and /var on a fresh instance is basically also
nothing. all-in-all a new jail costs you under a MB of zpool. we jail
stop/start and zfs send/receive instances in a blink of an eye and its
"almost" as good as having live migration.
We could get the same storage efficiency by simply cloning /, and having
no sub-datasets. some customers feel like they want to be able to write
anywhere and we give them those options but then they are on their own
and we don't manage the software updates for those guys and some like it
that way. we then bill each for all the storage they reference because a
year down the road they may be the only one still holding a reference to
the outdated prototype they're on even though they overwrote every file
twice with make world or freebsd-update. their memory usage is also way
higher than most because when executables are launched on the jails with
the read-only nullfs mounted /, those all access the same memory pages
but zfs isnt smart enough yet to let the virtual memory system maintain
those pointers through the indirection of zfs clones and snapshots.
so zfs separate /etc and /var give us great storage efficiency while
nullfs gives us great memory performance and efficiency.
>
> If you want to bunch of machines to look and/or act differently, then a shared /etc is precisely what you want. However, without allowing minor changes (ala ZFS clone/snapshot or by way of UnionFS), you'll quickly find that the only way to cope is with role-based scripting in /etc/rc.conf (it is after-all a shell script) or complicated abstraction layers (for example, using netgraph eiface devices with the jail-name inside them so that rc.conf have have jail-specific ifconfig_* lines). But I digress.
>
> I think the better solution to your loading of files "beneath" the eventual /etc filesystem is to throw away the ZFS snapshot/clone method and instead move to a UnionFS approach for /etc.
>
> If you use UnionFS for your /etc, then what you do is for each of the machines that you want *that* /etc to appear, you do something like:
>
> (as root) mount_unionfs -o below /etc /other/etc
>
> Now /other/etc (assuming it was empty before) looks exactly like /etc.
In theory, i love the concept of unionfs and it gives far more
flexibility than zfs especially if the two can be combined effectively.
For us, its semantics were just never well established enough and there
are too many corner cases and combinations of possibilities that, while
exciting, were never conceived of and cant be nailed down in a simple
VFS or POSIX filesystem mindset for obvious reasons. When i have the
time to really dig in again i'd love to see where unionfs is at today
and if i can be using it to do some very cool things again (but now with
less headaches legwork and sleepless nights). For the reasons stated
though, i have to admit i'm simply just _afraid_ of unionfs. Your
suggestion is simple enough though, i'm sure i wouldnt need a month of
research and testing. :) It's probably overkill for our needs in this case.
>
> Pros: With "rm -f <file>; rm -W <file>" (in /other/etc) you can reclaim a file from the underlying /etc. ZFS does not allow you to revert a single file (you can revert the entire volume or filesystem, but not a single file).
I really liked the idea of removing whiteout and having a lower file
appear but thats just me. :) You're right that ZFS doesn't let you do
anything nearly as selective but it does allow you cherry pick files out
of .zfs/snapshot. Like you said, that's not rolling a file back you're
just copying an old version to a new version atop the top "layer".
>
>
>
> if anyone with more intimate knowledge of how and exactly when login.conf.db gets accessed has any thoughts... It could be a disaster for an admin to think their /etc is in a certain state and have that one file be out of sync. If better minds could chip in, I'm wondering if we're better off editing /sbin/init to run init_script _before_ loading the daemon class from login.conf.db (or explain why thats a bad idea) or if i should just add some sort of hook to run cap_mkdb right when needed using a DTrace script or auditd?
>
> That's an interesting aspect of the boot process I hadn't noticed before (having not used init_script before). I would think that this should be filed as a PR. Seems to me that the init_script should fire first -- but (and this is a guess) it may need to bootstrap the user that the init_script runs as (presumably needing to load the daemon class for said user). While there may be good reason, it certainly violates a principle (that one might be astonished to learn that init_script is not run in a fashion that only the dependencies thereof are required).
>
>
I thought so too initially, init_script is documented as being for
[init]ialization BEFORE /etc/rc itself. It's obviously run as root and
early enough the machine ought to obey init_script as if it were
commandments handed down by God. Why init needs to know anything about
the daemon class beforehand is beyond me. Quite literally "beyond me". I
don't have a strong enough opinion either way though to be filing a PR
yet. I thought it's worth bringing up so brighter minds might take a
look if they find it peculiar. I have it back-burnered on one of a full
screen-border of post-it notes and i'll learn more about what's going on
in /sbin/init soon if no one else steps forward.
>> Does anyone think this issue is moot? (Can't we just document this particular specific "gotcha" instance? I don't think so, I abhor any "gotcha" that deviates from behavior people expect from "upstream" fbsd.) Does anyone agree it's important we come as close to perfect a solution as we can?
> Thanks for bringing up the issue with init_script. We should look to fix it to make its use capable of handling the use-case you identified (using it to bootstrap a separate /etc).
Good, see, this is why FreeBSD is awesome. People care about parameters
and configurations and having a stable system even in the face of
overwhelming combinatorics. Not to speak ill of Linux or sling mud with
vague accusations and no specific instances (but i'm going to haha) but
you have no idea how many times i've been using Linux in a project,
usually to do something a little cutting edge or off-the-reservation,
and i say "Hey i think i should be able to combine X with Y, can someone
help me?" and all too often i get the attitude like "man, we're all
doing Z now, havent ya heard? Z is here to end all our sorrows" and i'll
be like "but Z doesn't do X+Y" and to that i'm shamed and ridiculed like
"dude, if Z doesn't do everything you want and you don't worship Z with
us, youre stupid" hahaha does anyone else feel similarly about any
experience theyve had on the LKML? I can name almost 10 values for X, Y
and flavor-of-the-week Z.
>
>
>> Is a separate /etc even worth it to people?
> Depends. Everybody? certainly not. Some? Sure. See above example-cases.
>
>
>> Should i scrap that feature because of this issue?
> It sounds like you contorted yourself working around a deficiency in it (a POLA violation in that it has unforeseen dependencies). At the very least, I would think that init could have a fall-back if the file can't be loaded.
>
> Are you putting anything beside the default daemon-class definition in your login.conf "beneath" your true /etc?
Init does have a compiled in default class == the initial system
default "default" class. login.conf remains the source of truth on the
true "upper" /etc but things read login.conf.db to get their answers. At
the very outset of a system build, i move the plain old default
login.conf.db to /boot/etc and it contains all the classes.
99.9% of our users keep the default login.conf and maybe actually 100%
are using it just that way on any given day. I'm just that
anal-retentive that I think if i ignore this someone will suffer for
their astonishment (or unknowing lack thereof) when their db ends up out
of sync because they didnt know we introduced another event where
cap_mkdb should get run (post rollback/restore of /etc).
I would simply run cap_mkdb every time we mount /etc but i don't think
thats good enough because i dont know when and what else accesses it,
I'm assuming more than just /sbin/init at boot, right? Am I overthinking
this because nothing else reads login.conf.db ever? /usr/bin/login
accesses it every user login, no? Do i misunderstand totally?
>
>
>> I think we can tighten this up so theres no twisted ankles and no one falling in this rare case but certainly potential manhole. (the manhole i'm talking about is login.conf and login.conf.db being out of sync because the later is a symlink to /boot/etc and someone might rollback to a more restrictive login.conf and think they're covered without running cap_mkdb again but their login.conf.db is actually out of sync and less restrictive in a way that burns them)
>>
> Sorry you had to work around that -- you should have filed a PR.
>
I will file a PR if i look at the problem more in depth if someone
doesn't chime in and save me with already-expert knowledge that i don't
have to dig for. (one can hope, right?)
>
>> Devin, thank you IMMENSELY for bsdinstall and especially bsdconfig. I use them both at work and they make life so much better. And thank you for the simplification using kenv. I was unaware of it
> On a side-note, I didn't write bsdinstall -- I'm going to maintain it, but I wrote bsdconfig ^_^ (smiles)
>
> Thank you very much for your appreciation. Certainly a labor of love and I'm happy that others have kicked the wheels at least.
Yeah i've more than kicked the tires. It's excellent work keep it up.
More information about the freebsd-hackers
mailing list