Re: What exactly is hostid used for?

From: Olivier Certner <olivier.freebsd_at_free.fr>
Date: Sun, 27 Nov 2022 19:00:25 UTC
For whatever reason, I never received grarpamp's mail. So copy-pasting his
answer from online archives, and replying to Doug instead.

> > But frankly, why would you ever bother with disabling it?

Indeed, just after sending the mail I felt I had exaggerated a bit with this
question. Especially because I had written this just before:

> > Moreover, applications can access the hostid ('kern.hostid' or the longer
> > 'kern.hostuuid') through sysctl(3) or gethosid(3), and do whatever they
> > want with it.

(There is a typo above, it is gethostid(3), of course.)

So yes, this information is by default public and any application can obtain
it. And worse, it is not possible to prevent anybody from obtaining it on the
full host: Changing the permissions of '/etc/hostid' alone is useless, since
'sysctl kern.hostuuid' is not restricted, and currently cannot be without
additional code.

To avoid this particular leak, there are 3 easy ways:
- Disable the host ID ('hostid_enable="NO"' in '/etc/rc.conf').
- Have a startup script that overwrites '/etc/hostid' at each boot.
- Launch your application in a jail, which can have independently-set
  'kern.hostid' and 'kern.hostuuid', with a start script that produces and sets
  a random value for them (through '/etc/hostid', or directly setting them with
  the latter disabled).

The third is a bit more involved but preserves the ability to correctly set the
ID on the full host.

More generally, however, this leads me to FreeBSD's main information leakage:
The sysctl MIB. It is not restricted in any way, at least for reading, and
contains tremendous information about the running system, which surely is more
than enough to identify uniquely a given machine and could as well help when
trying to exploit the machine. This indeed should be fixed.

In principle this is easy, but the devil is in the details (such as the
(ab?)use of CTLFLAG_ANYBODY, and more generally in determining which
information is needed and useful for unprivileged userland, effectively
implying to tag the knobs that can be safely read by anybody, and perhaps
removing a bunch of them in favor of other mechanisms). I may give it a shot at
some point.

> Because "apps (often aka anti-privacy tools)" and nic's and attacks
> can read your unique and paste it into the internet, and do whatever
> else with it.

Yes, but if you use well-known open-source apps, risk is extremely low. On the
other hand, for proprietary apps, sure there is a risk, but then you should at
the very least jail them.

> Would not be surprising if javascript browsers are reading such things.

Doug's answer says the essential. I'm pretty sure browsers don't provide this
information directly, but it might be retrievable in indirect ways. But given
all the other easier fingerprinting vectors, a priori I doubt it is worth the
hassle.

> Nor would some environments want it embedded in say offsite backups of
> zpools, etc.

You're probably identifiable by what's in your backups, so if you're are
serious about this, you have to encrypt them in whole, in which case the host
ID cannot be read, so this is a non-issue.

> If NFS ZFS HAST and other apps can't take a manually supplied app-specific
> string to use instead of reading whatever happens (or not) to be in hostid,
> or even continuing to work with just null, then that represents bugs in
> privacy, validation testing, and debugging capability that need be fixed.

They all work without host ID as well. Albeit with a minor complication for ZFS,
as mentioned in my previous mail.

Don't know what would happen in the case of creating a zpool with no hostid.
I'd guess that 0 is used, but I'm not sure if this disables the import
check. For the network interfaces mentioned, a MAC address is generated
randomly instead of derived from the host ID. For NFSv4, I don't think you have
much other choice than running the client in a jail, where you can set a
specific host ID. Sure, there could be a mount option.

If we come up with enough use cases, it may appear that a more generic
mechanism is needed to allow selective overriding and restriction of a lot of
sysctl knobs. In the meantime, for the host ID and some selected others, we
have jails.

--
Olivier Certner