Re: curtain: WIP sandboxing mechanism with pledge()/unveil() support
Date: Wed, 30 Mar 2022 17:52:09 UTC
On 3/30/22 03:22, Baptiste Daroussin wrote: > Hello Mathieu, > First of all, thank you for this amazing work, leveraging the mac framework to > build curtain is imho an excellent idea, I personnally see a curtain like > approach as complementary to a capsicum approach rather than an antagonist > feature, I can see many possible usage of curtains in freebsd in particular in > the port framework! Alright! One nice thing with the MAC approach is that many of the checks were already carefully placed to not deviate too much from expected behavior for applications. At first I tried to do all of the checks in namei() and this often made syscalls fail too early with the wrong errnos and this confuses some programs. When I figured out the right place to add the checks it was almost always right next to a MAC check. Also, if I hadn't used a MAC module and added a new layer instead, you could say that it would have been the fifth (!!!) general access control mechanism in the kernel (after traditional UNIX, Capsicum, jails and MAC). This was starting to feel like a bit much. So the MAC framework, it's not a perfect fit for some of what this module wants to do but it could be worse. Also, if you combine curtains with chroots you kind of get jails. It's chroot escape-proof because unveils deny access to the outer directories like any other. Privileges are restricted. It doesn't have all the features of jails but it's very jail-like. Just to say, the MAC framework was pretty close to being able to implement jails too. > To allow to integrate and permit reviews from developers, I think we can/should > split the review. The first thing will probably me imho to start a review > process of the sysfilt feature, this is probably the part that will need most of > the back and forth discussion given the rest is pretty isolated (mac module, > userland). Yeah I think it's a good idea. I could do some minor cleanups and add some comments while at it before submitting it. I'm still working on this, but the majority of the changes should be in the userland and the module itself now. And getting some feedback on the kernel parts (and eventually knowing that they're in an acceptable state) would help. The module works well enough already that I don't think that there's anything critical that would require extra changes to the rest of the kernel (unless some problem is found). I think the kernel changes could be broken up something like this: 1) Sysfils. Adds annotations to every syscall in syscalls.master (Linuxulator could be done later too). Generalizes the Capsicum flag to a sysfils bitmap both for sysents and ucreds. Changes to the syscall entry checks. Some sysfil checks spread out in the kernel. 2) There are some small modifications that I believe are bug fixes or minor improvements or basically have no effect in the current state of things (but are important for mac_curtain). 3) The new MAC handlers. That's a significant part too and it can be separate from sysfils. This would include the new call sites in vfs_lookup() (and there's some extra logic too). New call sites in SysV/POSIX IPC modules (and some other changes to path handling). There's a function that deals with getdirentries() filtering directly in the MAC framework (the MAC hooks only deal with the dirents one by one) that maybe doesn't belong there. vn_open() got a new flag. 4) Various modifications that are specifically to support mac_curtain. I tried to minimize those but there are some. struct thread/proc need new fields (opaque pointers for unveil tracking/caching). struct file needs a few extra integer fields (I really tried to get rid of those but couldn't figure out a good way). I also added "sysctl_shadow" objects to kern_sysctl.c. These are handles that can outlast sysctl_oids. It's not elegant, but mac_curtain needs it. At first I thought that I could add long-lived references to sysctl_oids and I designed the curtain data structure for that. Then I realized that sysctl_oids get totally nuked when a module unloads. They get unmapped from memory with the rest of the module's data and there's no holding on to them (IIUC). So I added an intermediate data structure between sysctl_oids and curtains. The mallocs with non-sleepable locks in the SysV IPC module with late label initializations are annoying but this could be fixed separately (it might need a bit of code restructuring...). Making the module fplookup-friendly would probably need some changes to the MAC framework itself, this can be done separately too. There are others, but not critical and can be done separately. > Can you isolate the sysfils code and start a review in phabricator? If you need > help for this don't hesitate to ask me ;) Yup, I'm on it. > Again thanks for the huge work. No problem! This is gonna be a lot of work to review so thanks to anyone involved too. > > Best regards, > Bapt