From nobody Sat May 28 00:12:59 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 1BAEC1B41A23 for ; Sat, 28 May 2022 00:13:08 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-YT3-obe.outbound.protection.outlook.com (mail-yt3can01on2051.outbound.protection.outlook.com [40.107.115.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L92FG28jlz4Vvt for ; Sat, 28 May 2022 00:13:06 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=M7tOpOeMvWc+CxbXGhJZ1UoGyQpmhxsSnsalwWcwm7CXxWokWpFkzJIVefNiBjCxiQpPLBpTkl+GAtPdHy087kTqvSgyVFVojTMr1uU/9hB9sK6DgTY0l4gPlh4sP+7/34pjttW3+ddTHHOrBF9Chy2tkBlOMN1z0LOd4PBCdQjSeGXvdvkMRCV9iUI3Bl49IbwqCb6SJArzWYp/4rc06Cwj9YLlR8bILSDxMJRTw9AAPsHyYmD0ETD5160syiA6ArnYndf7q41MrYwp+5lTq7f26aXNv9k4e6ycYyCIl0nfiru9xVTXllUHu9f649iDZQLWizJepOhoYalK1FvmHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=kcnkoWQaSLiLjRbvazKlAS7KzSDvKyuK+QOFMsfVRBc=; b=n0NCuZvmR7LRHJmLcPlskljgTj5Ia4k2zIG6RQcbDuau2I+b5nqQKDRqC3PnRk7pD18NciMVRGizADSaMsLOcajawSabxgojZBlvc2QIIcs6cH6kXfJGjYKG1FAQSBYH541kI8yfjL/MfNJNMxDoPC3t6+xmmCI4v0hCRu6a5iivEIPTNaHQEATtUlE6VzrQ2zi2Axy9jp7a379HsEd9/jrBIvObYHijaCY4pth0kuMcVgAue2v2AxMcCM9Hx9/Fv7Ayf4ZKMX/tTc51clrMPLnQi6gZ4tmjCuuNHRsee4lmYr1HqxdbNUplTfWw6d5Uw3lz85SIqKglh/JKpwizsg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kcnkoWQaSLiLjRbvazKlAS7KzSDvKyuK+QOFMsfVRBc=; b=Vx0yEYLlU3bKXS7D/gfWmM2lBW0tigIo2xMJtqIJB+F0M2l/kukfOHHuCEINw4AlWBVzIU+5cObWFtvEk0RpUzt7gtbLXzJUdcKqCS+/xyxMW+WhEpmxGm+Y5HOOyMyv3iq1762v3GmLj3KzJEAbplyO9GQIZ8I7maEJaeGhT0rgsOOcvNg6XLo2c9Z3eHh9OCtb/Jb4otsyt4ewqtoz1zNUBbDnfd5OIfAfwHuGgkGN0RvxvHPNJ/O1AcnhzusXc5tnORJSHl3CkZW7fMWfc0wLJ+4FEeGfQz+Le4Dva53FHQnGlSxH+rjP+3Iaf63kidJtyxMffDauMsfsAE/5rA== Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:81::14) by YT2PR01MB4670.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:3d::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5293.13; Sat, 28 May 2022 00:12:59 +0000 Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc]) by YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc%6]) with mapi id 15.20.5293.016; Sat, 28 May 2022 00:12:59 +0000 From: Rick Macklem To: Andreas Kempe CC: "freebsd-fs@freebsd.org" Subject: Re: FreeBSD 12.3/13.1 NFS client hang Thread-Topic: FreeBSD 12.3/13.1 NFS client hang Thread-Index: AQHYcgY+3LPS/CtmUk+ZVTnbRrY6qK0zLx2fgAAamoCAACAyTA== Date: Sat, 28 May 2022 00:12:59 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 3872ff3f-d1c0-ca11-ac8e-ef94177e2ebd x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: eb3f58b6-34d2-49cc-1c6a-08da403ed2cb x-ms-traffictypediagnostic: YT2PR01MB4670:EE_ x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: j+2gMVKZnnnJO22+8/d1n1HSr/FhP3SzW/J+pAPAdboXb2sPyJSSjZaafOr1wmBqLSE+469f6RIIZNbgDbpQnw3z0tFB/iUyAnsEKGKtcQSy9tfUOTyFe3lkp47YjdqFXyMJ9Sar0ZD2Zg2ju5X9oVmV1/HATb52ty3TDk1a65/H9UZ108buhBRpmHVWvo5Camib1zsplxg+y64f3GEQGim7frL8CoQqsBlSb3MPwpfywwfXlg/shkDVXELSHDhfrU1TJ+/nMeAACo3kh8hiZpBgLUIFmr/udfcdhaLoFDxtbPu1apKPDJNUpF6kBSr4GXF6Uu9hZevGf680WmSrUpVzcyniEOcNoaUKqRrhKg0pvMFhv5ECsbiY3+YigRR9OU0P5noT6IufuxiAbEOmNCT0KTycOc4QZ6sBQIBSTRDzWu2pv/XwKMmrZJGWA+Dii0Fa2tuyCe2ZVWJA/CyjcF5qI73ga0s95XiPnPbVdFe7zgXOwH2w3UGNZLkRezBbVj5+Wj+GqeqZVfCk4kPhcChMUWg2k7BQaKq+BKQKzmXKpwsYXV1qyWrXzKROXZ8gGgfb60duQg46NdxBSQmGllnhLAFkRU5DaCPtl0no5RziYjqU0Ivsz3FurDHE/rcDYjP6J78D8r1OO/qbdkt0UJa9eOftDTrm1dw93mj3qnQY7WxuORm3tKrh9IuivUGgFbshimPdTcv8neEiP+mhfw== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230001)(4636009)(366004)(122000001)(38070700005)(7696005)(33656002)(38100700002)(76116006)(66946007)(66446008)(66556008)(71200400001)(66476007)(4326008)(8676002)(91956017)(64756008)(316002)(786003)(296002)(6916009)(55016003)(2906002)(9686003)(6506007)(86362001)(5660300002)(186003)(52536014)(83380400001)(8936002)(508600001);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 2 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?GnynIvF31zYCrPfXtb1/xNJVdW3yU+cnBFR10blQ64kGcZviSCibFIz8Np?= =?iso-8859-1?Q?SbfrVvhE5kBWRSu16JPw3U7ekRE/wr1AWQVtc/dfgT6sOFG8xjbmHkJ31C?= =?iso-8859-1?Q?gKeoEWhnUAXT9Y8RUUDRcjkNX3xxWwID5GBH7czs1Zv1twUyOhBvSezHYI?= =?iso-8859-1?Q?Bj+d06Dyd3QFTk3Tz0uFS0IV1NZ0+9tfTq+2wj3JFuEtWJ0w9DvGqDhb6A?= =?iso-8859-1?Q?pJ1VaBJT7iC5p8qZYrsTpCOeqCd3qfnkPgoYXkm2GdktpDWupG4nMm9/oU?= =?iso-8859-1?Q?1aJRlSwS4EM2STc4bPPezjE1D20hHh2n/wv2IUt68tZROfFzTKlzVsP3GH?= =?iso-8859-1?Q?Xf6/TbmuHukZzxTi1s57liXq9Zaq3+DhX0hZ5yNSKbdZKP8HF8Uho1m/gf?= =?iso-8859-1?Q?oWcJajbdgYMy4xMWw8qyn/XmCaemUCKDgOLOge3IqX+cueNuonq6Zex5U3?= =?iso-8859-1?Q?KQPeU1p/QcYieNqcazyNcjOd0Rojp4OPdszJo/YiSR8u304yzeKB19vRSf?= =?iso-8859-1?Q?PrnmENpKD53h9+uqcLkPg5xK4x9nA/JsIEiJQdD8tB2ehJBe2UvGG/YdJe?= =?iso-8859-1?Q?BAorzRZ6t5r+3OrENJr0v2qPTIIRF/K68nifL+m7xbIP2yAT0AETUMxwBy?= =?iso-8859-1?Q?nElSnWgCj0R9cElS2hoC6298RhqVrqmjG+YCwj7IR6PF4+sodVVG7VwfVB?= =?iso-8859-1?Q?Fq35nccBAYRC6+17w6tEKISk00RAQ+g5P0UH0BS4L5FFee0uhlrxjlmtez?= =?iso-8859-1?Q?K/4B6569M18W6NDTcQ3ia4iUMqHnBCs+nE0xFrSF7h3t/GlJrdtKEU6Qv5?= =?iso-8859-1?Q?wf9fKrSPvrOBBYYXQ5Hc4jFxAMvF6PLn2j7d/KNimv23izJQ//MWazr7/E?= =?iso-8859-1?Q?5kDdcuFw/kT7bnLp64cyfw+tm0wxA8gp+4Hgmk4zXTNOTj5pCfo92vEIyk?= =?iso-8859-1?Q?oFrKdykGLij/8qCK3UNX3l816hQ8CMkq21KxIuHmtzEanSymL0Mxpl1DMP?= =?iso-8859-1?Q?CSS+q6XAJPu13KLPTKV1op4feIdhSKnuvajbH6OJjGnxFblu58dets8Z4l?= =?iso-8859-1?Q?JM2Ax6UaXFoYAV9xuii6fXZ2WdyGlj86E0bTzGRrKiKcOOy3gBAWv+POQ7?= =?iso-8859-1?Q?AfkcZ5WyklYK3ZdfuUOJgeDAFTzkU98aohdSU+Vu6b6GMmdK96/YaJNlzz?= =?iso-8859-1?Q?XJizwwXk5eBngGjKtjyxKrZJ0limQ8JJA4vuHzWC5Q3PGQCVv48PJZSdH8?= =?iso-8859-1?Q?tWvGPZLy75vG5Olmqx1UC71+WdWDnZoGByRUmZZ7VyxZ23T6RAIwvjTEQb?= =?iso-8859-1?Q?uL8UgwQQkbc68cKkL680VQJQ2SO5TWeMW/kTLulwXgCsOwoUhOgNlV2SSe?= =?iso-8859-1?Q?jT1JEuyS4IeiY52WhLX/I17TKRB+oEitMb684YtVIByTZYfBwR47QIcqzZ?= =?iso-8859-1?Q?oZSkuX6bQRWAfozW91cCYuvDR5pUa9nPyRB/LYcNdFzQvD3kb2/k/RJ43h?= =?iso-8859-1?Q?lliaosOP+bB6qCndhqMWAOyiVmKII5FID85vk32WBBDrthHiYDTTX2pfro?= =?iso-8859-1?Q?YxNIti1oHxJA1Ki0+MKM34r6O8i/jLQjcVkCNstXxmRoa0a3bd6Jm4qNG4?= =?iso-8859-1?Q?JH8ke6L2+qJMXV4XyWZyUv3+HoqoQRKEHubyUcWalr6VAs6HQmP8pPsJlb?= =?iso-8859-1?Q?zm1CXjG7G31MWR6AvYbYTLP26B4jopDY/n50VuZbQE7gRjoL7PWMEYoTjm?= =?iso-8859-1?Q?hXJDt1gQ1naEQP67U6ZHrZH0jvjTLlrOMLkKV/XsgkCqsq27n1LobQLxta?= =?iso-8859-1?Q?sL+a+J2vhJdk/h4K89O/RhnP3erAx7dmu5TA725DW0GqVMUHnvb5fkxWjF?= =?iso-8859-1?Q?DC?= x-ms-exchange-antispam-messagedata-1: iFNvbSWbRuadVA== Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: eb3f58b6-34d2-49cc-1c6a-08da403ed2cb X-MS-Exchange-CrossTenant-originalarrivaltime: 28 May 2022 00:12:59.4867 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 998NYcfzx8c4U5V2HWrF0Q3BInyuqJ2t2TA7Mna9o76Ds8MIJVblaYphve4jHFJZy+SnC09HDrpmkX3j7Y3Tfw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YT2PR01MB4670 X-Rspamd-Queue-Id: 4L92FG28jlz4Vvt X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector2 header.b=Vx0yEYLl; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.115.51 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-6.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector2]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; RCPT_COUNT_TWO(0.00)[2]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; RCVD_IN_DNSWL_NONE(0.00)[40.107.115.51:from]; MLMMJ_DEST(0.00)[freebsd-fs]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.115.51:from] X-ThisMailContainsUnwantedMimeParts: N Andreas Kempe wrote:=0A= [stuff snipped]=0A= >=0A= > The one thing we have seen logged are messages along the lines of:=0A= > kernel: newnfs: server 'mail' error: fileid changed. fsid 4240eca6003a052= a:0: > expected fileid 0x22, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)=0A= I think this can also happen if a Getattr operation fails with an error at= =0A= the server. It then has "default attributes" and a default of 0x2 (root ino= de#)=0A= can be expected. So, I suspect this is what is happening. Generally, failed= =0A= Getattrs will be problematic, but I'm not sure if they can cause hangs?=0A= =0A= If you can capture packets when these get logged, we can confirm if a=0A= Getattr operation has failed with an error.=0A= =0A= rick=0A= =0A= > Also, maybe I'm old fashioned, but I find "ps axHl" useful, since it show= s=0A= > where all the processes are sleeping.=0A= > And "procstat -kk" covers all of the locks.=0A= >=0A= =0A= I don't know if it is a matter of being old fashioned as much as one=0A= of taste. :) In future dumps, I can provide both ps axHl and procstat -kk.= =0A= =0A= > > Below are procstat kstack $PID invocations showing where the processes= =0A= > > have hung. In the nfsv4_sequencelookup it seems hung waiting for=0A= > > nfsess_slots to have an available slot. In the second nfs_lock case,=0A= > > it seems the processes are stuck waiting on vnode locks.=0A= > >=0A= > > These issues seem to appear seemingly at random, but also if=0A= > > operations that open a lot of files or create a lot of file locks are= =0A= > > used. An example that can often provoke a hang is performing a=0A= > > recursive grep through a large file hierarchy like the FreeBSD=0A= > > codebase.=0A= > >=0A= > > The NFS code is large and complicated so any advice is appriciated!=0A= > Yea. I'm the author and I don't know exactly what it all does;-)\=0A= >=0A= > > Cordially,=0A= > > Andreas Kempe=0A= > >=0A= >=0A= > [...]=0A= >=0A= > Not very useful unless you have all the processes and their locks to try = and figure out what is holding=0A= > the vnode locks.=0A= >=0A= =0A= Yes, I sent this mostly in the hope that it might be something that=0A= someone has seen before. I understand that more verbose information is=0A= needed to track down the lock contention.=0A= =0A= I'll switch our machines back to using hard mounts and try to get as=0A= much diagnostic information as possible when the next lockup happens.=0A= =0A= Do you have any good suggestions for tracking down the issue? I've=0A= been contemplating enabling WITNESS or building with debug information=0A= to be able to hook in the kernel debugger.=0A= =0A= Thank you very much for your reply!=0A= Cordially,=0A= Andreas Kempe=0A= =0A= > rick=0A= >=0A= >=0A= =0A=