From nobody Thu Aug 24 03:56:34 2023 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RWTmK6Jg3z4qWff for ; Thu, 24 Aug 2023 03:56:49 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RWTmJ6PBFz4LbC for ; Thu, 24 Aug 2023 03:56:48 +0000 (UTC) (envelope-from jdavidlists@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b="U7WL0B/K"; spf=pass (mx1.freebsd.org: domain of jdavidlists@gmail.com designates 2607:f8b0:4864:20::102f as permitted sender) smtp.mailfrom=jdavidlists@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-2685bcd046eso3574525a91.3 for ; Wed, 23 Aug 2023 20:56:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692849406; x=1693454206; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=DkdwII83YGWM7Nnk0tA5LR8NUbKqyD6bC0lYJQwe04I=; b=U7WL0B/KpYcKjacY93KwEbG0/75Q30+HMPBQHpOgd6xBOofKtSt55TOcRlDJNZnSq4 KxiYp//gc7yprlQJv9YtYlo3yxZol4Z2ekreK6gjfLFFz683r8DjC7z4hZzXkX+ACUq+ lt09kAJuJ90wC+9DVXX2xPs31hEQYiI+x5nlT2VCCzb4GHamf3revhMLNhjan1ZQRw7M IRnFKCDRlg+Xi0+Tx/ibNa7zxHfFquiP6f2J1kTyPqlFKp4sYya38K4yqLe1GCKx3emJ +DXKb+TFSWNfONsKMhsSEeMjwmw7yvotp+kF2sLXv42EDCWrDh0aZCTdLWF4T6KrKJ2D EdCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692849406; x=1693454206; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=DkdwII83YGWM7Nnk0tA5LR8NUbKqyD6bC0lYJQwe04I=; b=iGGKaVyUDVKgjjRAEQKnus6tWYuW/3lVpt1e+GoCTsuVmAD3CYSnPRHV8gsc6OshaX VP57GjDkrlDe6fO7dqa0+4tTbU+zv7mYXB3Cfljzav15jUPMYD9vxKPf3/K5qlT95cAT pb7WrcyMl15sGRlrtA24Ro5yXBJHgU9cbNP9upWiGIXiNGlzHbjOcPUGQzs21iQSVwhd G/mRlW5b14cf0DKCCT8ecZ87j0IYSf1Vv2MkCtvDqMcBhDeTpp9KVjY22o7m5thNSDDS ayKuTxCLjuvBxAOrEJEQmme819CtffYmgwC2R7qP17kxeI2DC8GS25Ztg4Mr4KzFCfI6 pOvg== X-Gm-Message-State: AOJu0Yz1RmzI+a2JgiC9dj2MBs+2aivsv3veuGs0RLlmaNkJK2+FLJJr 5eZr4pJZMlgab0FeDr7cFSOq6JddzLHJjqDlYc+30L/DXck= X-Google-Smtp-Source: AGHT+IFs640uJSMTkFUtAX37l094eq3VP2vlDhtMJuShggVj0wSHcIpW4zAdf4h4AUvbA+B2nYIZt8V0s5B5KqwSwd8= X-Received: by 2002:a17:90b:17cc:b0:26f:7555:76 with SMTP id me12-20020a17090b17cc00b0026f75550076mr8356679pjb.11.1692849405518; Wed, 23 Aug 2023 20:56:45 -0700 (PDT) List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 From: J David Date: Wed, 23 Aug 2023 23:56:34 -0400 Message-ID: Subject: NFS client hang on 13.2-RELEASE-p2 on file locking / wrong interface selected To: FreeBSD FS Content-Type: text/plain; charset="UTF-8" X-Spamd-Result: default: False [-4.00 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.998]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; MIME_GOOD(-0.10)[text/plain]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; FREEMAIL_FROM(0.00)[gmail.com]; PREVIOUSLY_DELIVERED(0.00)[freebsd-fs@freebsd.org]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::102f:from]; DKIM_TRACE(0.00)[gmail.com:+]; TO_DN_ALL(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; TAGGED_FROM(0.00)[]; RCVD_COUNT_ONE(0.00)[1]; RCVD_TLS_LAST(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; MIME_TRACE(0.00)[0:+]; MLMMJ_DEST(0.00)[freebsd-fs@freebsd.org] X-Spamd-Bar: --- X-Rspamd-Queue-Id: 4RWTmJ6PBFz4LbC Hello, We are seeing NFS hangs on FreeBSD 13.2-RELEASE-p2 clients talking to a Debian bookworm NFS server using NFSv3. Whenever a process attempts to lock a file on an NFS mount, for example: lockf x sleep 3 That process hangs in state "nlmrcv" and goes to 100% CPU. I found huge numbers of exchanges like this via tcpdump: 03:05:41.432581 IP 172.17.200.2.998 > 172.17.250.10.50516: UDP, length 172 0x0000: 4500 00c8 9afb 0000 4011 c4f9 ac11 c802 E.......@....... 0x0010: ac11 fa0a 03e6 c554 00b4 1af6 04ed ac8b .......T........ 0x0020: 0000 0000 0000 0002 0001 86b5 0000 0004 ................ 0x0030: 0000 0002 0000 0001 0000 001c 64e6 c8e0 ............d... 0x0040: 0000 0002 6332 0000 0001 bb87 0001 bb87 ....c2.......... 0x0050: 0000 0001 0000 61a8 0000 0000 0000 0000 ......a......... 0x0060: 0000 0004 3873 0200 0000 0000 0000 0001 ....8s.......... 0x0070: 0000 0002 6332 0000 0000 0020 0100 0601 ....c2.......... 0x0080: a0da 65c4 00d0 6316 0000 0000 0000 0000 ..e...c......... 0x0090: 0a00 0a00 0000 0000 c733 0800 0000 0009 .........3...... 0x00a0: 3130 3030 3031 4063 3200 0000 0001 86a1 100001@c2....... 0x00b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x00c0: 0000 0000 0000 0025 .......% 03:05:41.432632 IP 172.17.250.10.50516 > 172.17.200.2.998: UDP, length 20 0x0000: 4500 0030 4675 4000 4011 da17 ac11 fa0a E..0Fu@.@....... 0x0010: ac11 c802 c554 03e6 001c 1a5e 04ed ac8b .....T.....^.... 0x0020: 0000 0001 0000 0001 0000 0001 0000 0001 ................ 03:05:41.432647 IP 172.17.200.2.998 > 172.17.250.10.50516: UDP, length 172 0x0000: 4500 00c8 9afc 0000 4011 c4f8 ac11 c802 E.......@....... 0x0010: ac11 fa0a 03e6 c554 00b4 1af6 04ed ac8c .......T........ 0x0020: 0000 0000 0000 0002 0001 86b5 0000 0004 ................ 0x0030: 0000 0002 0000 0001 0000 001c 64e6 c8e0 ............d... 0x0040: 0000 0002 6332 0000 0001 bb87 0001 bb87 ....c2.......... 0x0050: 0000 0001 0000 61a8 0000 0000 0000 0000 ......a......... 0x0060: 0000 0004 3973 0200 0000 0000 0000 0001 ....9s.......... 0x0070: 0000 0002 6332 0000 0000 0020 0100 0601 ....c2.......... 0x0080: a0da 65c4 00d0 6316 0000 0000 0000 0000 ..e...c......... 0x0090: 0a00 0a00 0000 0000 c733 0800 0000 0009 .........3...... 0x00a0: 3130 3030 3031 4063 3200 0000 0001 86a1 100001@c2....... 0x00b0: 0000 0000 0000 0000 0000 0000 0000 0000 ................ 0x00c0: 0000 0000 0000 0025 .......% 03:05:41.432697 IP 172.17.250.10.50516 > 172.17.200.2.998: UDP, length 20 0x0000: 4500 0030 4676 4000 4011 da16 ac11 fa0a E..0Fv@.@....... 0x0010: ac11 c802 c554 03e6 001c 1a5e 04ed ac8c .....T.....^.... 0x0020: 0000 0001 0000 0001 0000 0001 0000 0001 ................ A huge number of these are exchanged. Like, several million over the span of a couple of minutes. Then the FreeBSD client system becomes unresponsive with these console messages: [nl_neigh] rtnl_lle_event: error allocating group writer [nl_neigh] rtnl_lle_event: error allocating group writer [nl_neigh] rtnl_lle_event: error allocating group writer [zone: mbuf] kern.ipc.nmbufs limit reached [nl_neigh] rtnl_lle_event: error allocating group writer [nl_neigh] rtnl_lle_event: error allocating group writer [nl_neigh] rtnl_lle_event: error allocating group writer Now, while 172.17.200.2 is an IP address on the client and 172.17.250.10 is an IP address on the server, that's the wrong subnet. The filesystem is mounted over a dedicated VLAN for NFS which has the IPs 172.20.200.2 and 172.20.250.10. So whatever this traffic is, it's using the wrong interface. I was able to work around this by swapping the 172.17.0.0 network from the NFS server. But that's not exactly an optimal solution. The main problem may be on the Debian side. I was able to workaround the issue by swapping the order of the network interfaces on the Debian side so the NFS VLAN was the "first" interface. That suggests to me that something on the Debian side is choosing the first interface instead of the right interface. So I'll pursue that. But it'd sure be nice if the FreeBSD client didn't hang in this situation. Does anyone know what might be happening here or have any other insight that might help me track this down? Thanks for any advice!