From nobody Sat May 28 16:00:07 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 752EE1B61497 for ; Sat, 28 May 2022 16:00:15 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-YT3-obe.outbound.protection.outlook.com (mail-yt3can01on2067.outbound.protection.outlook.com [40.107.115.67]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L9RG624s3z3klp for ; Sat, 28 May 2022 16:00:14 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=b/j0GfRMMALqZGtuMkeQm6PQNKesTK91D+WHXOSPC9yZOxErnoeF17CIzlnBQLvsaJvBGWIzLkCHvgrdqO0jeZv+orwoThrQhXWxcuNKrb/L+xsUKvXW4BEmzlUF59jR6xkFETldq4+4PNz+LN/si01ser3QLocy8kj6cZlVenqBqEtmX5qiYM+ecTcfmV2qpYKOBqAGyl01qrrx0b6a1+Pk9UkuLKXGdas1G9WMT/iHF9BJ7XniJq9TpFajfIiHWTgnh+987omVNrWgUUDCjmdfHRcW3bwMkLfigdAI8q2dQdK3cKMo6bGFgG5V+akeTrxdmu/Kb05oHDlsJH0QBA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=J83r6wvrOYy/mHyCDyWwAibNNMCMvhZWKINXDDMOC2M=; b=Q+Ds/5l/YxjtdCj1iVXWxgI6udwDg9WTKIwxw3P/8PWvpzjBFOaVKFWgRTrki5Yo4wiYVdRo+kx61eViJHE/+/9ae0gM6FuTQ4k4ybzklyNOUfE52pM5krUdQhtPbo1ASzjXwX96e3W6LD9fYO6Vbhu85OunKmoBidruw520N2e+ypQrDB1Z706zWe1yiTjL6V68B+OC9cX2VXsEYol4EzV6xsR5YFa5xj+fSM8p95P3hDDuBlKvCuUf+Fwx3GmE8RNqyd1F1A6vnSml9GIkC229agWcL0B9Y9lunC3ZdSR3Xh+TzSkJ1zHCzUH/17LdMipr68KbmrPwJVr6nY21/Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=J83r6wvrOYy/mHyCDyWwAibNNMCMvhZWKINXDDMOC2M=; b=fghI47oGpQ4nuceYV7pHzYUBB+R0JV8sOWrS+3El6isZDU+WRvKIAOzUJOcj0z6JqgwBGjvX+nWcLfIKsL4j0JlYWzbyziMMBEnKg6ILKOF/71w7wWAcU3/ZPiqSe5lwjzUfJ3cbYFZ6NtVdPRFDaTpacVaw+/VyvmTSc0ePsGcYAUNpJeF/6pQ4UFH0DVtLRO1uySBNJJ+WKeg1SSAb4yooxVFRrUyNW8zjheCVgs4vBUil2kN1wP8ZHp+MC2cu+/pG6E48+BnHgVF08n0tN1n2v1U5fSJaCoPnlYhX2yNIb1MutYj98qf1/7Ei1GqOvznDWitlKVX0VJcOZmAqjA== Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:81::14) by YT2PR01MB9208.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:a4::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5293.13; Sat, 28 May 2022 16:00:07 +0000 Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc]) by YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc%6]) with mapi id 15.20.5293.018; Sat, 28 May 2022 16:00:07 +0000 From: Rick Macklem To: Andreas Kempe CC: "freebsd-fs@freebsd.org" Subject: Re: FreeBSD 12.3/13.1 NFS client hang Thread-Topic: FreeBSD 12.3/13.1 NFS client hang Thread-Index: AQHYcgY+3LPS/CtmUk+ZVTnbRrY6qK0zLx2fgAAamoCAASIOpA== Date: Sat, 28 May 2022 16:00:07 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: a41a836a-01a8-4761-1e23-0a8e6f4c488f x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 4a1dbbbf-2bb5-46e0-9f34-08da40c322c4 x-ms-traffictypediagnostic: YT2PR01MB9208:EE_ x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: WuWg9mV2hGzRChqZ9LYbJQZbuBHsw3LQlI72efvf/Onh+OLuBqYzwIV0bPh+l8+1Y5eQ1Tos4zsW1ybEcjkiuk3aqEZbwkD5g79S3tNicEjtjCsWUp4qfxMbt7erBv2jugVAYPf/h0MLkJxZxHiAPL8JSKIU5PX8BIEGzPkkrTvKT1bXJjTqRjIg86f46MizUbhA/GetCTs0O8L+Kksnc468R7+e2+yCsOoCiGodZ1bwB2cZPBHH7ov/diBsUKl4ivrKE3usdWAkUEkrk8WQTy43TRzFmoCsAGJCVWE99wv4uvJ6gz8CwyHJ4DezQXdDGUhlNFl4sUEwr9M4L2rcRlOW93ka4L8qxFPLGslt2+dEGiu+UsXW4cJhf/dwgGxnTuP/FylP1ffeNJQfIklzSq8jAF11Wl+OaYkytEeBRO63o23GIIAfwGPNwLQf7Kt3m22FS6WrUgoamebyTUeX7mcxJOEtqtI3Mdl1nRDlqCYcWFORrJXYQecRKnoq4LMy+Q7okWCjRruxCVlZnLS+IRGGDu7gAIP1eIDn9tphOf/fGvV8y30wHz3IIuYZ8V+UetFZWYcgCKroi2QX4qnOS0Lk2Glvfu8zAo7YzIya5j5yOqIOlz6ckW9bE1bKFZsjR1wBwU8zraVKtRmONp0CqCG/YtKF/xy0YsBwnofy0d3FCNGrJ879qQNPHHqm+enURigfOt9AvRPx05djwlt7fQ== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230001)(4636009)(366004)(6916009)(38100700002)(9686003)(2906002)(5660300002)(83380400001)(186003)(52536014)(296002)(33656002)(316002)(86362001)(8936002)(4326008)(76116006)(91956017)(6506007)(8676002)(7696005)(122000001)(786003)(55016003)(66476007)(66446008)(66946007)(64756008)(71200400001)(66556008)(508600001)(38070700005);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 2 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?kcwUXYiophBu2R/ikOCGIRoQqIU/3DI8cLtbqtTh3ziu0ejQDjLQG81Tep?= =?iso-8859-1?Q?1eTVCSsbhOi6sZf7BFLuNUpV1JbgHIb8NZB/m9S1qaXTelqzXiA6IoKwRf?= =?iso-8859-1?Q?uue5HdBwWhff2RsKuVtoxGDPcxuNONFPVg7XQl/uSXtZ4snTyUbMeuRtNp?= =?iso-8859-1?Q?WxAqerpDlagVvvSxP0IF4Cq8wyJEpgNTY7L3wlPwZYW9Ljdmtks50siXBm?= =?iso-8859-1?Q?5O49VvRKHPTDqUFj2mvfWMqlpFtFlB6Gl517mOs2wDP5JEE1EcVe+MgOKT?= =?iso-8859-1?Q?PZYy8BLaF1syIiXQqgVygxcnKGan/XL4Ec5dU2FgMfxaNyC0cKyu7frcKc?= =?iso-8859-1?Q?qIbQ1pehX2BUhLXZ2H/QRrNbhm9j70TV4KxCFFtCcbnLcAMz3BN+J4KPZD?= =?iso-8859-1?Q?9W2U9lm3Sxy8BsDsU42rmToRvKJcKCy93LKACTn/Dxq+jnNcfyxYgkUNkt?= =?iso-8859-1?Q?Y2DqPcXPedw4RU/k22axBAUZL5V4u85jL6FepI78rDo20EC4aZ4IzX05/9?= =?iso-8859-1?Q?OrHE6nztD5XdU+ffK8iA8ixS0xXCMD1w2TaQtlJtxpwvxV1l8Fu1R4bBFb?= =?iso-8859-1?Q?Xh7dZ3pC1Xo2B1WbgdnsJC2tAjP8frC2cwIHgno0jBWKGU+nDAm2FXBVwO?= =?iso-8859-1?Q?T0T9Q1yCkjnnxuxvZjl+G3DmRPporDrzxQRKM/ZzIphr8RFTWqvIf4dvp1?= =?iso-8859-1?Q?z89I2inlACwLmZJObapF8/LfYrB0icM9peB2vduDUUYwAJAn8MYoZYiaO4?= =?iso-8859-1?Q?lt3e9lYW/USKlk+WeM0+GRfgi7eSzA3LhXBTxlAChU4yOPa5d9JLLlwY0V?= =?iso-8859-1?Q?mXYCIZg8lb2eaEzZ1IfSqLF+cIH3SV9y/aPbbNEcfy3PywTLi4XSiF60pv?= =?iso-8859-1?Q?j3vIc1f6OhD2t78IFSqr6QMd7v7e/qKlR1oBSfSbZAX1S3nvdNVszq6ApF?= =?iso-8859-1?Q?xmEP/iOuVRgQQOsVuad2gV311PR8uyb6y8Kl1sGTWDl4wCw4e21i/fW69R?= =?iso-8859-1?Q?QD4JMgwv4E0XxG3NyzL9KhIRRdyeLACtA96tadgCFiOLjQDXrALCJpHH+3?= =?iso-8859-1?Q?J94BADoD1SlVRMPbZ5c1WP6Dsvwdnzkc2ABc/F3b6otnxLkAoYyIXIf1yN?= =?iso-8859-1?Q?ym9oqHa15P+HtPiXWXvZL4iO80JrN1xUidlCvCZHFwngw4tpZQlYcx+fqP?= =?iso-8859-1?Q?xEt8jud+eeHLmoTt89JgdfziVcLzkZvAhQJcgoTrGLKeFjpJE5o5i+J27D?= =?iso-8859-1?Q?KEnRFEZ2Plp7bu6y8ye8wUmFp/Ineld7sRvIrlv+Ek2wJ+fAfRjjD650ml?= =?iso-8859-1?Q?5HAST8s4rkOpVpcaDj5TDdIUitO5BPFFNU5LeEciqqfhn6j68U+sT/qG8a?= =?iso-8859-1?Q?St8s0AR8xATZ9pSBg1Yfh5TZ3rWMmZSQibdFu43m41AxaqpdA9aJhS1qe+?= =?iso-8859-1?Q?IDRpOfGJ23mR2wVXkiFzh/SkIsGJ6veGuT9MbQb0eqUQhP1GrH8wlHd/U3?= =?iso-8859-1?Q?KNRzN4YzfR0T+EY5jCc2ZtGVEZuCD97PHyQ5DBQDyQ2YDhwvsLjRAdRz9S?= =?iso-8859-1?Q?GMiqGrsxtF/bF6xHuEHlEiXoD2eUJ/mJyMrYkNPGOFejwbRus6pA/okMD6?= =?iso-8859-1?Q?j+3R2WqyB0CJVEJHH1beHFLksHuukx9+PsxqxAUhhJF3DZ8eQ5NVnKS8s2?= =?iso-8859-1?Q?cMTamczjzEZT+6QV5rXpAO8BbJpikiw6MLFLRH4L+6wydiyJACPhF/0c/r?= =?iso-8859-1?Q?lL7yAGIAmMbS/6Zoe1ZgKjbs6A9UHyi08UI4mi4M5NcCVQTEVYv3AO59Ef?= =?iso-8859-1?Q?ZyHLP9TafoX19oYJFgnCasKpOplnkqscw75WB93a7ZdsLKr46xtEcm5mdX?= =?iso-8859-1?Q?54?= x-ms-exchange-antispam-messagedata-1: tkmJOQbhyb6dzA== Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 4a1dbbbf-2bb5-46e0-9f34-08da40c322c4 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 May 2022 16:00:07.1833 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 5QgBqXEDdi4tMcszsOvhwbkU+LT/N1ahsege0DI3PROXiuh4AxKYLmsmS5aSOawTPCZz7G+qBYKU24jqdUlQPA== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YT2PR01MB9208 X-Rspamd-Queue-Id: 4L9RG624s3z3klp X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector2 header.b=fghI47oG; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.115.67 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-6.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector2]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; RCVD_IN_DNSWL_NONE(0.00)[40.107.115.67:from]; MLMMJ_DEST(0.00)[freebsd-fs]; NEURAL_HAM_SHORT(-1.00)[-0.999]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.115.67:from] X-ThisMailContainsUnwantedMimeParts: N Andreas Kempe wrote:=0A= > On Fri, May 27, 2022 at 08:59:57PM +0000, Rick Macklem wrote:=0A= > > Andreas Kempe wrote:=0A= > > > Hello everyone!=0A= > > >=0A= > > > I'm having issues with the NFS clients on FreeBSD 12.3 and 13.1=0A= > > > systems hanging when using a CentOS 7 server.=0A= Here are a few other things to consider:=0A= Delegations - They are complex and seldom improve performance.=0A= I think I finally have them implemented reliably, but???=0A= They are disabled by default in the FreeBSD server and can be=0A= avoided by not running the nfscbd(8) daemon when mounting=0A= non-FreeBSD NFS servers.=0A= # nfsstat -E -c=0A= - If it shows non-zero "Delegs", consider disabling them.=0A= =0A= TSO- Some net chips/drivers don't get these quite right. NFS is very=0A= good at finding the flaws, since it generates all kinds of small and= =0A= weird sized TSO/TCP segments.=0A= - Consider trying disabling TSO if intermittent hangs persist.=0A= =0A= Jumbo mbuf clusters - Some net interfaces use jumbo mbuf clusters=0A= when jumbo frames are in use. These can fragment the memory=0A= pool that mbuf clusters are being allocated from.=0A= # vmstat -z | fgrep mbuf_jumbo=0A= - and look to see if the third numbers are non-zero.=0A= Reducing the mtu may be a performance hit, but if the memory=0A= pool that clusters are allocated from becomes too fragmented,=0A= NFS will come to a grinding halt.=0A= =0A= An NFSv4 server that does not reply to an RPC. This is a badly broken=0A= server. NFSv4 servers are supposed to reply NFSERR_DELAY if they cannot=0A= do an RPC at the time requested. They are not supposed to throw away=0A= the request without replying.=0A= Hopefully, such servers do not exist. If they do, the mount will hang.=0A= About the only way to detect this would be a packet capture when it=0A= happens.=0A= About the only fix is a different NFS server or using NFSv3 mounts, which= =0A= are stateless and might work better in this case.=0A= =0A= rick=0A= =0A= =0A= > First, make sure you are using hard mounts. "soft" or "intr" mounts won't= =0A= > work and will mess up the session sooner or later. (A messed up session c= ould=0A= > result in no free slots on the session and that will wedge threads in=0A= > nfsv4_sequencelookup() as you describe.=0A= > (This is briefly described in the BUGS section of "man mount_nfs".)=0A= >=0A= =0A= I had totally missed that soft and interruptible mounts have these=0A= issues. I switched the FreeBSD-machines to soft and intr on purpose=0A= to be able to fix hung mounts without having to restart the machine on=0A= NFS hangs. Since they are shared machines, it is an inconvinience for=0A= other users if one user causes a hang.=0A= =0A= Switching our test machine back to hard mounts did prevent recursive=0A= grep from immediately causing the slot type hang again.=0A= =0A= > Do a:=0A= > # nfsstat -m=0A= > on the clients and look for "hard".=0A= >=0A= > Next, is there anything logged on the console for the 13.1 client(s)?=0A= > (13.1 has some diagnostics for things like a server replying with the=0A= > wrong session slot#.)=0A= >=0A= =0A= The one thing we have seen logged are messages along the lines of:=0A= kernel: newnfs: server 'mail' error: fileid changed. fsid 4240eca6003a052a:= 0: expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)=0A= =0A= > Also, maybe I'm old fashioned, but I find "ps axHl" useful, since it show= s=0A= > where all the processes are sleeping.=0A= > And "procstat -kk" covers all of the locks.=0A= >=0A= =0A= I don't know if it is a matter of being old fashioned as much as one=0A= of taste. :) In future dumps, I can provide both ps axHl and procstat -kk.= =0A= =0A= > > Below are procstat kstack $PID invocations showing where the processes= =0A= > > have hung. In the nfsv4_sequencelookup it seems hung waiting for=0A= > > nfsess_slots to have an available slot. In the second nfs_lock case,=0A= > > it seems the processes are stuck waiting on vnode locks.=0A= > >=0A= > > These issues seem to appear seemingly at random, but also if=0A= > > operations that open a lot of files or create a lot of file locks are= =0A= > > used. An example that can often provoke a hang is performing a=0A= > > recursive grep through a large file hierarchy like the FreeBSD=0A= > > codebase.=0A= > >=0A= > > The NFS code is large and complicated so any advice is appriciated!=0A= > Yea. I'm the author and I don't know exactly what it all does;-)\=0A= >=0A= > > Cordially,=0A= > > Andreas Kempe=0A= > >=0A= >=0A= > [...]=0A= >=0A= > Not very useful unless you have all the processes and their locks to try = and figure out what is holding=0A= > the vnode locks.=0A= >=0A= =0A= Yes, I sent this mostly in the hope that it might be something that=0A= someone has seen before. I understand that more verbose information is=0A= needed to track down the lock contention.=0A= =0A= I'll switch our machines back to using hard mounts and try to get as=0A= much diagnostic information as possible when the next lockup happens.=0A= =0A= Do you have any good suggestions for tracking down the issue? I've=0A= been contemplating enabling WITNESS or building with debug information=0A= to be able to hook in the kernel debugger.=0A= =0A= Thank you very much for your reply!=0A= Cordially,=0A= Andreas Kempe=0A= =0A= > rick=0A= >=0A= >=0A= =0A=