Re: HEADS UP: NFS changes coming into CURRENT early February
Date: Tue, 28 Jan 2025 02:10:42 UTC
On Tue, Jan 21, 2025 at 10:27 PM Gleb Smirnoff <glebius@freebsd.org> wrote: > > CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca. > > > Hi, > > TLDR version: > users of NFS with Kerberos (e.g. running gssd(8)) as well as users of NFS with > TLS (e.g. running rpc.tlsclntd(8) or rpc.tlsservd(8)) as well as users of > network lock manager (e.g. having 'options NFSLOCKD' and running rpcbind(8)) > are affected. You would need to recompile & reinstall both the world and the > kernel together. Of course this is what you'd normally do when you track > FreeBSD CURRENT, but better be warned. I will post hashes of the specific > revisions that break API/ABI when they are pushed. > > Longer version: > last year I tried to check-in a new implementation of unix(4) SOCK_STREAM and > SOCK_SEQPACKET in d80a97def9a1, but was forced to back it out due to several > kernel side abusers of a unix(4) socket. The most difficult ones are the NFS > related RPC services, that act as RPC clients talking to an RPC servers in > userland. Since it is impossible to fully emulate a userland process > connection to a unix(4) socket they need to work with the socket internal > structures bypassing all the normal KPIs and conventions. Of course they > didn't tolerate the new implementation that totally eliminated intermediate > buffer on the sending side. > > While the original motivation for the upcoming changes is the fact that I want > to go forward with the new unix/stream and unix/seqpacket, I also tried to make > kernel to userland RPC better. You judge if I succeeded or not :) Here are > some highlights: > > - Code footprint both in kernel clients and in userland daemons is reduced. > Example: gssd: 1 file changed, 5 insertions(+), 64 deletions(-) > kgssapi: 1 file changed, 26 insertions(+), 78 deletions(-) > 4 files changed, 1 insertion(+), 11 deletions(-) > - You can easily see all RPC calls from kernel to userland with genl(1): > # genl monitor rpcnl > - The new transport is multithreaded in kernel by default, so kernel clients > can send a bunch of RPCs without any serialization and if the userland > figures out how to parallelize their execution, such parallelization would > happen. Note: new rpc.tlsservd(8) will use threads. > - One ad-hoc single program syscall is removed - gssd_syscall. Note: > rpctls syscall remains, but I have some ideas on how to improve that, too. > Not at this step though. > - All sleeps of kernel RPC calls are now in single place, and they all have > timeouts. I believe NFS services are now much more resilient to hangs. > A deadlock when NFS kernel thread is blocked on unix socket buffer, and > the socket can't go away because its application is blocked in some other > syscall is no longer possible. > > The code is posted on phabricator, reviews D48547 through D48552. > Reviewers are very welcome! > > I share my branch on Github. It is usually rebased on today's CURRENT: > > https://github.com/glebius/FreeBSD/commits/gss-netlink/ > > Early testers are very welcome! I think I've found a memory leak, but it shouldn't be a show stopper. What I did on the NFS client side is: # vmstat -m | fgrep -i rpc # mount -t nfs -o nfsv4,tls nfsv4-server:/ /mnt # ls --lR /mnt --> Then I network partitioned it from the server a few times, until the TCP connection closed. (My client is in bhyve and the server on the system the bhyve instance is running in. I just "ifconfig bridge0 down", waited for the TCP connection to close "netstat --a" then "ifconfig bridge0 up". Once done, I # umount /mnt # vmstat -m | fgrep -i rpc and say a somewhat larger allocation count The allocation count only goes up if I do the network partitioning and only on the NFS client side. Since the leak is slow and only happens when the TCP connection breaks, I do not think it is a show stopper and one of us can track it down someday. Other than that, I have not found any problems that you had not already fixed, rick > > -- > Gleb Smirnoff >