From nobody Tue Jan 28 02:10:42 2025 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Yhpfp266Qz5lkPB for ; Tue, 28 Jan 2025 02:10:58 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Yhpfn356tz3y9m; Tue, 28 Jan 2025 02:10:57 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20230601 header.b=EIcIWrqh; spf=pass (mx1.freebsd.org: domain of rick.macklem@gmail.com designates 2a00:1450:4864:20::52a as permitted sender) smtp.mailfrom=rick.macklem@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ed1-x52a.google.com with SMTP id 4fb4d7f45d1cf-5d3d143376dso7227690a12.3; Mon, 27 Jan 2025 18:10:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738030255; x=1738635055; darn=freebsd.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ADlBoALvJBNegqpxEOb1/qNZ85As10Wf/Rt+1x3mNCk=; b=EIcIWrqh38DTQXlPJEtVuntdFqoyVhwRSp25vkNKmd9GutNqpy0L0MUM/+FI4hp8Sv EONbIUtUMWKKRiBwdQ3IsX8CbYQ9bnThfgbPDweoqPZu48EOYZKEfR9dx2NLa2zJEzhM GxOpJfTxgTaWs8+lxrbUB8Z1rROaPawbbNyYr6XuAyzN1rstP5jwT6ZQQw5UtQYf9dKz V3827oi5JJl5dCnpdnBg6hORvvp550lTM859TRL9XXRLmbLNxiO0goeq+R7weWU2zYZ/ q0hN1o2e9NqIwL3rX8alJMgr+sIOkAaK19QMdHSN5OozNb+7Svd4f8acsi848lsTpga5 Tg2A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738030255; x=1738635055; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ADlBoALvJBNegqpxEOb1/qNZ85As10Wf/Rt+1x3mNCk=; b=E5pLmNhyoEJOn4KShwgUh+W33/cNNapEvwsCJSyz77LpTlHUt9OU8mCliQpmSju2yU M4uJY0osB/g/3eNzuSIjC6GvqK/eiM5gjsYWwKI38tn+63HJKn0ov26IfqLX0vq7r+Ap eG9Jq/ZnarA/CL+y2EPrxDvHLICluGrbTrNOXjifNumReell0Z0LCQvD+W2n0xb9KNc/ mIpa16E5+QzUafH8X9RF5Ni7qFuPSghshtp+EAdSbB2wM1JudjgE/mwPbOplS6ie8anC NDkSPrCtV38rX66HJ3KQ5zDedHy9KmZhpOVJOvNCCyk5K3YCNkqnbs8JYLkZNYRaVLBB UBVg== X-Forwarded-Encrypted: i=1; AJvYcCXquVA3G4JNuXyzVJlJjFwGPRB3167Sv8mfkIMmRbG9KiJy8FT2CdmQG+3cOTYKD9Mq+V/phnexrg==@freebsd.org X-Gm-Message-State: AOJu0Yyl/6+OqEAu5YeQyDHNdko3MX7mR8eraMAZCcoMp5cCHg5TcpEq qW0ayjt/u7A+4ZebIPN0w1vuaJdY6Em/MPqURs8FOowRRGVARYqyA3cYizOh6OPEWJ+hDUcLDiZ TCTylxI+kT5jORXn8DR98jKRh9DeI X-Gm-Gg: ASbGncvuXJpgbCgFbOCm4XPqP0DTLe28m4lVJl71pkp3003p55OpD0xBKZOtpxCfEGh qMBsuzexW82w0LTdw3uoWo+l1Jk958V7tsGR9csZ4NQkWjQVZjKA0LJ6YUUgziFT5Yibty+RDTV af5o17HJtZHGkhbhaeo5w= X-Google-Smtp-Source: AGHT+IFe0uZ2HmUPprWH+tHg3JFIAs5ywxbl631qLCM8xP3Yd6TQW5Gaacqx9MQbpTA7eJD7ofqbZrnbJveEQIAvz2I= X-Received: by 2002:a05:6402:50ca:b0:5d0:e9de:5415 with SMTP id 4fb4d7f45d1cf-5db7d2f89e0mr40623747a12.14.1738030255050; Mon, 27 Jan 2025 18:10:55 -0800 (PST) List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@FreeBSD.org MIME-Version: 1.0 References: In-Reply-To: From: Rick Macklem Date: Mon, 27 Jan 2025 18:10:42 -0800 X-Gm-Features: AWEUYZn5L-sPY6D_gGOMHArnn0MNpsIK5tQaQA7KmvjBo_54oLAgTyjL8hWfbl0 Message-ID: Subject: Re: HEADS UP: NFS changes coming into CURRENT early February To: Gleb Smirnoff Cc: current@freebsd.org, rmacklem@freebsd.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20230601]; MIME_GOOD(-0.10)[text/plain]; RCVD_TLS_LAST(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; MIME_TRACE(0.00)[0:+]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; FREEMAIL_FROM(0.00)[gmail.com]; TAGGED_FROM(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::52a:from]; MID_RHS_MATCH_FROMTLD(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; MLMMJ_DEST(0.00)[current@freebsd.org]; RCVD_COUNT_ONE(0.00)[1]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MISSING_XM_UA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com] X-Spamd-Bar: --- X-Rspamd-Queue-Id: 4Yhpfn356tz3y9m On Tue, Jan 21, 2025 at 10:27=E2=80=AFPM Gleb Smirnoff wrote: > > CAUTION: This email originated from outside of the University of Guelph. = Do not click links or open attachments unless you recognize the sender and = know the content is safe. If in doubt, forward suspicious emails to IThelp@= uoguelph.ca. > > > Hi, > > TLDR version: > users of NFS with Kerberos (e.g. running gssd(8)) as well as users of NFS= with > TLS (e.g. running rpc.tlsclntd(8) or rpc.tlsservd(8)) as well as users of > network lock manager (e.g. having 'options NFSLOCKD' and running rpcbind(= 8)) > are affected. You would need to recompile & reinstall both the world and= the > kernel together. Of course this is what you'd normally do when you track > FreeBSD CURRENT, but better be warned. I will post hashes of the specifi= c > revisions that break API/ABI when they are pushed. > > Longer version: > last year I tried to check-in a new implementation of unix(4) SOCK_STREAM= and > SOCK_SEQPACKET in d80a97def9a1, but was forced to back it out due to seve= ral > kernel side abusers of a unix(4) socket. The most difficult ones are the= NFS > related RPC services, that act as RPC clients talking to an RPC servers i= n > userland. Since it is impossible to fully emulate a userland process > connection to a unix(4) socket they need to work with the socket internal > structures bypassing all the normal KPIs and conventions. Of course they > didn't tolerate the new implementation that totally eliminated intermedia= te > buffer on the sending side. > > While the original motivation for the upcoming changes is the fact that I= want > to go forward with the new unix/stream and unix/seqpacket, I also tried t= o make > kernel to userland RPC better. You judge if I succeeded or not :) Here a= re > some highlights: > > - Code footprint both in kernel clients and in userland daemons is reduce= d. > Example: gssd: 1 file changed, 5 insertions(+), 64 deletions(-) > kgssapi: 1 file changed, 26 insertions(+), 78 deletions(-) > 4 files changed, 1 insertion(+), 11 deletions(-) > - You can easily see all RPC calls from kernel to userland with genl(1): > # genl monitor rpcnl > - The new transport is multithreaded in kernel by default, so kernel clie= nts > can send a bunch of RPCs without any serialization and if the userland > figures out how to parallelize their execution, such parallelization wo= uld > happen. Note: new rpc.tlsservd(8) will use threads. > - One ad-hoc single program syscall is removed - gssd_syscall. Note: > rpctls syscall remains, but I have some ideas on how to improve that, t= oo. > Not at this step though. > - All sleeps of kernel RPC calls are now in single place, and they all ha= ve > timeouts. I believe NFS services are now much more resilient to hangs. > A deadlock when NFS kernel thread is blocked on unix socket buffer, and > the socket can't go away because its application is blocked in some oth= er > syscall is no longer possible. > > The code is posted on phabricator, reviews D48547 through D48552. > Reviewers are very welcome! > > I share my branch on Github. It is usually rebased on today's CURRENT: > > https://github.com/glebius/FreeBSD/commits/gss-netlink/ > > Early testers are very welcome! I think I've found a memory leak, but it shouldn't be a show stopper. What I did on the NFS client side is: # vmstat -m | fgrep -i rpc # mount -t nfs -o nfsv4,tls nfsv4-server:/ /mnt # ls --lR /mnt --> Then I network partitioned it from the server a few times, until the TCP connection closed. (My client is in bhyve and the server on the system the bhyve instance is running in. I just "ifconfig bridge0 down", waited for the TCP connection to close "netstat --a" then "ifconfig bridge0 up"= . Once done, I # umount /mnt # vmstat -m | fgrep -i rpc and say a somewhat larger allocation count The allocation count only goes up if I do the network partitioning and only on the NFS client side. Since the leak is slow and only happens when the TCP connection breaks, I do not think it is a show stopper and one of us can track it down someday. Other than that, I have not found any problems that you had not already fixed, rick > > -- > Gleb Smirnoff >