Add jail execution environment support to the FreeBSD test suite

From: <igor.ostapenko_at_pm.me>
Date: Thu, 22 Feb 2024 20:57:54 UTC
Hi FreeBSD developers,

There is a proposal to improve the FreeBSD test suite.


1 The Problem

The FreeBSD test suite is based on the Kyua framework. The latter supports
running tests in parallel. However, some tests cannot be run in parallel and
are marked with is_exclusive="true" metadata, which makes Kyua run such tests
in sequence.

Many tests are not meant to be exclusive conceptually, they are so for very
simple technical reasons. For instance, some network related tests are based
on jail and vnet usage. It's convenient for such tests and it provides a lot
of isolation already not to conflict with other tests. But they are still
marked as exclusive due to the shared space of jail names, routing, etc.

The project seeks more tests, and it's kind of a trend for new tests like
jail/vnet based ones to be created as is_exclusive="true" from the very
beginning. It only piles up the suite with exclusive tests, e.g. new tests
from my side faced a fair question from a reviewer whether they could be
re-designed for a parallel run. [1]

If such tests were 100% isolated they would be able to run in parallel and
decrease the test time for CI runs and for the runs within the development
process.

And the problem is that trying to add more isolation by a test itself looks to
be a doable task from a glance, but it would add a lot of complexity to a test
code, or could be found as an impossible task in a specific case.


2 The Idea

The idea is not new. A test could be running in a jail -- it provides the
required isolation with minimum or zero effort from a test.


3 The Implementation

There is a lot of work done already and the working patch passed the initial
review (thanks to markj@ and ngie@). [2]

It adds a new concept to the Kyua framework -- an execution environment. Two
new metadata were added for that: execenv and execenv_jail.

execenv is a switch to select an environment. If a test's metadata defines
execenv="jail" then Kyua will create a temporary jail, run such test within
it, and remove the jail. If execenv="host" is provided or execenv metadata is
undefined then Kyua will run such test as it does today.

execenv_jail metadata takes effect only in case of execenv="jail". It allows a
test to request specific parameters for its jail. These parameters are simply
arguments to jail(8), e.g. execenv_jail="vnet allow.raw_sockets".


4 The Adoption

ATF based tests can easily define this new metadata via Kyuafile or directly,
e.g. for atf-sh based tests:

	test_head()
	{
		atf_set descr "Test foo in case of bar"
		atf_set require.user root
		atf_set execenv jail
		atf_set execenv.jail vnet allow.raw_sockets
	}

Non-ATF based ones will do it via Kyuafile. Our test suite does it through a
Makefile:

	TEST_METADATA+= execenv="jail"
	TEST_METADATA+= execenv_jail="vnet allow.raw_sockets"

The patch got some little evolution, I started with a single execenv_jail
metadata, and during the patch discussion and review, I ended up with two
knobs: execenv and execenv_jail. It turned out to be a cleaner and less tricky
interface such way. The evolution reasoning can be found in the history of the
respective Differential. [2]


5 MFC Concerns

For now, I see at least one issue from the usual project workflow perspective.
Let's imagine that the Kyua framework got this execenv feature committed to
15-CURRENT, we started to convert existing tests and create new ones to use
execenv="jail". If some feature or a bug fix needs to be ported back to
14-STABLE or 13-STABLE, then "old" Kyua without execenv feature will fail to
run such tests:

	kyua: E: Load of 'Kyuafile' failed: Failed to load Lua file 'Kyuafile': Kyuafile:9: Unknown metadata property execenv.

From a combinatorics perspective, the first three options pop up to deal with
that:
a) Patch Kyua the same way for the supported STABLE branches so it will be
   able to run back ported tests based on execenv="jail" (it's not system ABI
   change after all)
b) Exclusively patch Kyua framework for the supported STABLE branches to
   simply skip such tests (does not look to provide much benefit)
c) Do not back port tests, only the fix/feature itself (kind of a bad idea)


6 The Demo

My test environment showed promising run time numbers for almost the whole
test suite (ZFS excluded). One of the tests yielded 36 min with test
parallelism improvement versus 1 h 25 min without. In my case with 8 cores,
the suite runs about 2 times faster with the improvement. [3]


7 Action Points

My current vision of the plan looks as follows:
- [ ] community: Review, testing, comments -- probably we want to change the
                 design
- [ ] committers: Help with the main commit -- it should hit freebsd/kyua
                  GitHub fork first [4], then vendor branch, and merge to
                  main after
- [ ] igoro: Provide the subsequent PRs to separate FreeBSD specifics and fix
             existing Kyua tests
- [ ] igoro: Provide the PRs to add brand new tests of Kyua itself to cover
             the new feature
- [ ] igoro: Provide the respective documentation updates
- [ ] igoro: Migrate some of the existing tests for the start, e.g. netpfil/pf
- [ ] committers: Help with review and respective commits/merges

The plan is not strict, it depends on the discussion and interest of
volunteers.

I hope that this proposal is found valuable for the project. If so, any help
is appreciated.


[1] New tests exclusivity concern: https://reviews.freebsd.org/D42314
[2] The Kyua patch: https://reviews.freebsd.org/D42350
[3] The whole test suite demo: https://reviews.freebsd.org/D42410
[4] The respective PR to the fork: https://github.com/freebsd/kyua/pull/224


Best regards, Igor.