From nobody Wed May 25 15:41:48 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 087A01B442F9 for ; Wed, 25 May 2022 15:41:57 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-qb1can01on2050.outbound.protection.outlook.com [40.107.66.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L7b0M58G7z4bVj; Wed, 25 May 2022 15:41:55 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Y53Igig/2I0QqNva6RaBYD5Zf5PiabYWuGBr1jko2ilcY3tezaJc/VpItogunpvh5sjjf7M+LyfW+lceHEY2OMY7wjdRL1KYOOVry9CLD06Ty9wVYkqQD8D1qSddalSaJrpEyJ6b+z7uu5SYDZ4f88KxZY7QSpAEpa4OUcGoxHkp4QldJHqJ2DxubPFwVR3WcUf8U/SWZmsw3L3EMZoZXJZ+RQEcZzSRJo5qp/y9kAnHvk3pibMTUWWKAl78gMP09o76EQ1zdyp9S0kW6Y1xzgy2L/iTzSD0W6c9AjLHljR3q8zFzdltzGSJIOs2t+lmxLnX/6mva2XQRFj2WKTb8A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=bNqmLenA0NWce+oGfzTesHd3FX5c3f6wToo5JADP2cU=; b=YOVbteW9AEwvde+JRGm1m4FkpRc54KgdoTMedNFB8JF3PHVJTLM0evWFTN4InnV9bHbzoHxGdpB61iuj8IGPAudj5gS06RH5l6Rkv3qRwl79stXKj/SEE+zYdv861zyhwShAldjhWgNV4Z13CiKOIQOx8mIfGg17924dVLvMt2A98JEkbTZybSybfj2qfLxOMYgyBuIg/hD33ESlES8SKSJ93Ll8DWcMDUsOMMWajdfKOOpAY9SsCu9d7pX6e1VQYFFhtG8aQnsir/l2LSjqBxoe10yGT8rRHAGJTX03b1ABTKc3HB4m3CvuTxMhI2UtLNh1deJzOA325fRWXBi4Vw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=bNqmLenA0NWce+oGfzTesHd3FX5c3f6wToo5JADP2cU=; b=XYtqaFxHM4kibKUaomczbx92z3eEU4x+7bhQ1dOLF8mXHnXB2vYzOP6Nqx/i00fbP8Y7hZ+wW6wp6nRLRiU5gE51Os3Jay+TW8/BpRhhx7gLIcC7IBqla1l0HedegmDOh0wuNrXaQu2FYKvJKiGqGW5C03IWZDmN7m19L/plnhsbYMU7WFRPEpwlCAK9I9cKfXf/eoDnBJ7jVkM0mcTXuCH2mjHGl/tEFyCHe7QmlF+/2B+YaozgcrFoOtPAY+gd2eKW6llzgrUH3lAA+VwlPjCN5sKP3HJNGxtFATfadJ2prvm5jwyEMx4cDJEFj65Aa801wVqzn/VUQRlunK5C0w== Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:81::14) by YQXPR01MB6464.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:4c::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5293.13; Wed, 25 May 2022 15:41:48 +0000 Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc]) by YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc%6]) with mapi id 15.20.5293.013; Wed, 25 May 2022 15:41:48 +0000 From: Rick Macklem To: Adam Stylinski , John CC: "freebsd-fs@freebsd.org" Subject: Re: zfs/nfsd performance limiter Thread-Topic: zfs/nfsd performance limiter Thread-Index: AQHYaxQfJkENgRkWo0q5KjKiS/ePTK0q68g+gATWMbA= Date: Wed, 25 May 2022 15:41:48 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: f88affbd-78ea-a077-afec-8aea3a28a835 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 469c91f1-d747-4ab7-859e-08da3e651465 x-ms-traffictypediagnostic: YQXPR01MB6464:EE_ x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: kzMhXmISU4D1d15Y2ZOuCydPEKR5orRKBNVOwjnewZwGvCH9dPlf8zc6qvsNG7rkRXtfc0u8VIm0b43alUl1BBL76Z3cuSFuGs49gFMgLS4IR3Qpu2xNMlsJd55iqRFooUPYCai5dzs5XV/ZlOyy2bxtZgJYP8HOyNnKVgMSAWpbLmQJyP9KiyXj0Xksu/5nwQLc3O6ATQNVwIEcYsep8wwCrVAPDpbf/Y+k1MQSW4sJmXq6Lpx9maSBcBnI4XW8EB0HndXt/9Hn65MC8sNfWJ2shCWDb/K2W7ua599iNCTCjFhyJIr60LezXovA4HwPj0dChWf9ay5sdWhB9YbgZl6bfj8nXpfnVEKOOo4x6teiia5ForzNmOFE2kVspFA3No3ibcx0dlijZ1bgUsayKSFlViS4j91xhGWLL1RAEYXDYWhMeFmmcE72Kpqo3EsJltHbOctGvrxNOcgO2FAlHMbEyOeyjhnhjQTt9gtBLivPcLprEzNCiN+hNDEx4fNHXPuazbm/gUzIFRcougiSKFs4tt4C1DHyMLUJpPK7L6F8q98JcbV0eN7SHYE0yBpCR3kytP1K6lqCP+87DPnR7DvZmXsuzoQv+oLb3S7yJK9tYbZen+PQG04GORIdtuDLRJAa/aZF1qAHQqh9X0IR2FrpeZTRist79QHay8HMbR26JsZQs7J0RDDanu5kArwqWHNUaqXKX0EWrdnwyBUkR8JmXET123Q+fiMG+0xupxATo4srsMP4C/3K4WT6pXbqAsoIt69mBk1fX38pHft9NqQg8pa/AD3MWYzuz4vUA7k= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230001)(4636009)(366004)(71200400001)(7696005)(83380400001)(91956017)(4326008)(8676002)(86362001)(33656002)(316002)(186003)(786003)(966005)(508600001)(38100700002)(2906002)(8936002)(55016003)(110136005)(3480700007)(9686003)(5660300002)(122000001)(52536014)(66946007)(64756008)(66476007)(66556008)(66446008)(76116006)(53546011)(6506007)(38070700005);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 2 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?1TKSTWFXrDJ3nyXe9AK0ahtRQN8nR0PVnNUDRJERzRnt3tjJw2L1AqcQki?= =?iso-8859-1?Q?rE0xpfFY9wPH3CFKItIxHtSI9rRwzp2QhWXw5TexHb6kgxjTuNj/gN8Vn9?= =?iso-8859-1?Q?r/dDQymdnfoMjaWjYTIUxc/7jt6dWP7EqVGsEXtnJCuU4mpWGyAjx3NYSB?= =?iso-8859-1?Q?Wa/dFUWmTBvxdINxrItzs/N6lga09QM4pOtWWoRXhwLLJF2AolBUmiwG/8?= =?iso-8859-1?Q?wp+LrXdsXiEH/Ty9oSmsHDdvZAg1GR99x9LJHBNpmir7W2+zoHSVCQDkT1?= =?iso-8859-1?Q?GE0c7sQMIlwulvb1cNVC9zCqQ4ugcqH2HoTDSPdIZMaMC2Svk+Ld8YQImZ?= =?iso-8859-1?Q?YlNH01qugCnbUJPw5nfSQqghqLgeK2S4zro6aVDmksDvWVNEQZS2badqCw?= =?iso-8859-1?Q?KbyLzY/IX6RAc34LK9+fUr3W9XzAuMcdqzE7P7K16Iij3QXliM3y93tQa0?= =?iso-8859-1?Q?Cg+uFTtSnrMtyx9L42q89AjeTRXKE7jHjUuo3465dQn0pQWGIH+Foku6ht?= =?iso-8859-1?Q?lpKmjHpWcVWuSHEqmuT9LWyxhq/s3Q2d/vydpyAeksj0bWSYYwyexzuRta?= =?iso-8859-1?Q?aezEyptmGdMYC2JsvMsZ4+KydLcLuVeyvkjEoZ4VZiSug/tRx9Gnn6015W?= =?iso-8859-1?Q?WS/bhyWgEF4u3dJq9Yr8Hc6QKfu8F+xy86b6QOaUfsDpGnHNnJg+4e5D3b?= =?iso-8859-1?Q?pG5iqcJWNbZ9mqBqtG3uRIN0UqdJcrn7UUjq+IuHv+sanE9sLvqVtOIE8o?= =?iso-8859-1?Q?gWd/zHs0y1yyb1H1ELh8FXh91pCXFuVsYMeSB2sRpoZDyb9jlEXwRd/g7h?= =?iso-8859-1?Q?i93ToDqm8PRPQtPmC/428r/ln0+mV/5bc0WrvIOCApxzKUBwyFG4PQ39c/?= =?iso-8859-1?Q?uCdSzoYQWa7i999xYR4w/WM0Ur84wv+Hhcn8fODUUosJ0NEPsfr6DTgnVD?= =?iso-8859-1?Q?Z2wUgY+4e7nVxvpzn2PiKZC5xRT6FYBifpI82a0qatXztFhSQR6XOWRGbe?= =?iso-8859-1?Q?yYPOeibz1gW6mTXGp1tx6NJh6P8Q20ykFwcRlQ+vh4n+BMc3yNF8Jrz8Pd?= =?iso-8859-1?Q?Qmbw8bK02B5A0YWMjimf9WnyJ63WClRYAcwMp5vZ3jpd398UXunwojqg/J?= =?iso-8859-1?Q?Rx5fPFlr5vhV8w5vD04ca+7myTTSuUxtccqBc6tQoUU8Sg3+c2tNNv5YzO?= =?iso-8859-1?Q?E11Ix0MITHehh8f/zu2kw+TqaNOE0VXWhmTK1pV7Ls9+rjBwfCC6kpfCla?= =?iso-8859-1?Q?n7NfYOFj1EW8e069qsvsainPH1tGIfDHnAuw2m5FGTI045nlsRbM2hkRn9?= =?iso-8859-1?Q?UJHJE9aXCvp0iNywRlospTOf5bfZxC/egPVF8QHGtgQRkEXm4pBG3t0Ert?= =?iso-8859-1?Q?jKCFa3ikWYRfg9pAv6LuUto3Y8NKh39XNt0I5FeM8Ncoc+uwDN6R3H38XR?= =?iso-8859-1?Q?PaXrOoZy09RdbdQJyF+NUciqL1wr+GU/AEiq+fMdgNbc8Zu96nWTvt9ULE?= =?iso-8859-1?Q?UqMZRfgOmunlDZ+T6lFgnagzQ5AIG0h8iR9Q0C7Yhlo2PwV+wzsR26CfGS?= =?iso-8859-1?Q?GlmO+ufg2j66/9VpVqMiY4+jG9bb1ddU6PTb8Tq/DTiEtj+wPamoV1EsVi?= =?iso-8859-1?Q?8L4CLM7p9u7mvPYB2FbD5QLbaoWesOuXduGU1T4NE/wly4mEj5b+lE8HXL?= =?iso-8859-1?Q?GPzNGAUkWU2EoSZPK41fqco/7gveqjqXw6gBFBkUDYOtPgewfqSCVnm4xF?= =?iso-8859-1?Q?cjzu/oKgILwKfGTe/5I6X0FYqPgTfLS4N6v5uABT361I+jVr7MEjZneq/2?= =?iso-8859-1?Q?ByATyn5X/e6qSC1mN8WhGK7b7cR6933vH9jXoQKpNeWlIjRqj/igcYh88S?= =?iso-8859-1?Q?52?= x-ms-exchange-antispam-messagedata-1: x7TKUCpqRkJgrA== Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: 469c91f1-d747-4ab7-859e-08da3e651465 X-MS-Exchange-CrossTenant-originalarrivaltime: 25 May 2022 15:41:48.0538 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: TSyd3z1oFFlq77101ZOiV6XAMnJTF29AqJCi90gXhQZT06a9Wz/Z9saBjxGuVCz97lorLYzFxfXSu7O5fBuPpw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YQXPR01MB6464 X-Rspamd-Queue-Id: 4L7b0M58G7z4bVj X-Spamd-Bar: ------ Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector2 header.b=XYtqaFxH; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.50 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-6.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector2]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.50:from]; NEURAL_HAM_SHORT(-1.00)[-1.000]; MLMMJ_DEST(0.00)[freebsd-fs]; FREEMAIL_TO(0.00)[gmail.com,freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.50:from] X-ThisMailContainsUnwantedMimeParts: N Adam Stylinski wrote:=0A= [stuff snipped]=0A= =0A= > > ifconfig -vm=0A= > mlxen0: flags=3D8843 metric 0 mtu= 9000=0A= Just in case you (or someone else reading this) is not aware of it,=0A= use of 9K jumbo clusters causes fragmentation of the memory pool=0A= clusters are allocated from and, therefore, their use is not recommended.= =0A= =0A= Now, it may be that the mellanox driver doesn't use 9K clusters (it could= =0A= put the received frame in multiple smaller clusters), but if it does, you= =0A= should consider reducing the mtu.=0A= If you:=0A= # vmstat -z | fgrep mbuf_jumbo_9k=0A= it will show you if they are being used.=0A= =0A= rick=0A= =0A= =0A= > netstat -i=0A= Name Mtu Network Address Ipkts Ierrs Idrop=0A= Opkts Oerrs Coll=0A= igb0 9000 ac:1f:6b:b0:60:bc 18230625 0 0=0A= 24178283 0 0=0A= igb1 9000 ac:1f:6b:b0:60:bc 14341213 0 0=0A= 8447249 0 0=0A= lo0 16384 lo0 367691 0 0=0A= 367691 0 0=0A= lo0 - localhost localhost 68 - -=0A= 68 - -=0A= lo0 - fe80::%lo0/64 fe80::1%lo0 0 - -=0A= 0 - -=0A= lo0 - your-net localhost 348944 - -=0A= 348944 - -=0A= mlxen 9000 00:02:c9:35:df:20 13138046 0 12=0A= 26308206 0 0=0A= mlxen - 10.5.5.0/24 10.5.5.1 11592389 - -=0A= 24345184 - -=0A= vm-pu 9000 56:3e:55:8a:2a:f8 7270 0 0=0A= 962249 102 0=0A= lagg0 9000 ac:1f:6b:b0:60:bc 31543941 0 0=0A= 31623674 0 0=0A= lagg0 - 192.168.0.0/2 nasbox 27967582 - -=0A= 41779731 - -=0A= =0A= > What threads/irq are allocated to your NIC? 'vmstat -i'=0A= =0A= Doesn't seem perfectly balanced but not terribly imbalanced, either:=0A= =0A= interrupt total rate=0A= irq9: acpi0 3 0=0A= irq18: ehci0 ehci1+ 803162 2=0A= cpu0:timer 67465114 167=0A= cpu1:timer 65068819 161=0A= cpu2:timer 65535300 163=0A= cpu3:timer 63408731 157=0A= cpu4:timer 63026304 156=0A= cpu5:timer 63431412 157=0A= irq56: nvme0:admin 18 0=0A= irq57: nvme0:io0 544999 1=0A= irq58: nvme0:io1 465816 1=0A= irq59: nvme0:io2 487486 1=0A= irq60: nvme0:io3 474616 1=0A= irq61: nvme0:io4 452527 1=0A= irq62: nvme0:io5 467807 1=0A= irq63: mps0 36110415 90=0A= irq64: mps1 112328723 279=0A= irq65: mps2 54845974 136=0A= irq66: mps3 50770215 126=0A= irq68: xhci0 3122136 8=0A= irq70: igb0:rxq0 1974562 5=0A= irq71: igb0:rxq1 3034190 8=0A= irq72: igb0:rxq2 28703842 71=0A= irq73: igb0:rxq3 1126533 3=0A= irq74: igb0:aq 7 0=0A= irq75: igb1:rxq0 1852321 5=0A= irq76: igb1:rxq1 2946722 7=0A= irq77: igb1:rxq2 9602613 24=0A= irq78: igb1:rxq3 4101258 10=0A= irq79: igb1:aq 8 0=0A= irq80: ahci1 37386191 93=0A= irq81: mlx4_core0 4748775 12=0A= irq82: mlx4_core0 13754442 34=0A= irq83: mlx4_core0 3551629 9=0A= irq84: mlx4_core0 2595850 6=0A= irq85: mlx4_core0 4947424 12=0A= Total 769135944 1908=0A= =0A= > Are the above threads floating or mapped? 'cpuset -g ...'=0A= =0A= I suspect I was supposed to run this against the argument of a pid,=0A= maybe nfsd? Here's the output without an argument=0A= =0A= pid -1 mask: 0, 1, 2, 3, 4, 5=0A= pid -1 domain policy: first-touch mask: 0=0A= =0A= > Disable nfs tcp drc=0A= =0A= This is the first I've even seen a duplicate request cache mentioned.=0A= It seems counter-intuitive for why that'd help but maybe I'll try=0A= doing that. What exactly is the benefit?=0A= =0A= > What is your atime setting?=0A= =0A= Disabled at both the file system and the client mounts.=0A= =0A= > You also state you are using a Linux client. Are you using the MLX affini= ty scripts, buffer sizing suggestions, etc, etc. Have you swapped the Linux= system for a fbsd system?=0A= I've not, though I do vaguely recall mellanox supplying some scripts=0A= in their documentation that fixed interrupt handling on specific cores=0A= at one point. Is this what you're referring to? I could give that a=0A= try. I don't at present have any FreeBSD client systems with enough=0A= PCI express bandwidth to swap things out for a Linux vs FreeBSD test.=0A= =0A= > You mention iperf. Please post the options you used when invoking iperf = and it's output.=0A= =0A= Setting up the NFS client as a "server", since it seems that the=0A= terminology is a little bit flipped with iperf, here's the output:=0A= =0A= -----------------------------------------------------------=0A= Server listening on 5201 (test #1)=0A= -----------------------------------------------------------=0A= Accepted connection from 10.5.5.1, port 11534=0A= [ 5] local 10.5.5.4 port 5201 connected to 10.5.5.1 port 43931=0A= [ ID] Interval Transfer Bitrate=0A= [ 5] 0.00-1.00 sec 3.81 GBytes 32.7 Gbits/sec=0A= [ 5] 1.00-2.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 2.00-3.00 sec 4.18 GBytes 35.9 Gbits/sec=0A= [ 5] 3.00-4.00 sec 4.21 GBytes 36.1 Gbits/sec=0A= [ 5] 4.00-5.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 5.00-6.00 sec 4.21 GBytes 36.2 Gbits/sec=0A= [ 5] 6.00-7.00 sec 4.10 GBytes 35.2 Gbits/sec=0A= [ 5] 7.00-8.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 8.00-9.00 sec 4.21 GBytes 36.1 Gbits/sec=0A= [ 5] 9.00-10.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 10.00-10.00 sec 7.76 MBytes 35.3 Gbits/sec=0A= - - - - - - - - - - - - - - - - - - - - - - - - -=0A= [ ID] Interval Transfer Bitrate=0A= [ 5] 0.00-10.00 sec 41.5 GBytes 35.7 Gbits/sec recei= ver=0A= -----------------------------------------------------------=0A= Server listening on 5201 (test #2)=0A= -----------------------------------------------------------=0A= =0A= On Sun, May 22, 2022 at 3:45 AM John wrote:=0A= >=0A= > ----- Adam Stylinski's Original Message -----=0A= > > Hello,=0A= > >=0A= > > I have two systems connected via ConnectX-3 mellanox cards in ethernet= =0A= > > mode. They have their MTU's maxed at 9000, their ring buffers maxed=0A= > > at 8192, and I can hit around 36 gbps with iperf.=0A= > >=0A= > > When using an NFS client (client =3D linux, server =3D freebsd), I see = a=0A= > > maximum rate of around 20gbps. The test file is fully in ARC. The=0A= > > test is performed with an NFS mount nconnect=3D4 and an rsize/wsize of= =0A= > > 1MB.=0A= > >=0A= > > Here's the flame graph of the kernel of the system in question, with=0A= > > idle stacks removed:=0A= > >=0A= > > https://gist.github.com/KungFuJesus/918c6dcf40ae07767d5382deafab3a52#fi= le-nfs_fg-svg=0A= > >=0A= > > The longest functions seems like maybe it's the ERMS aware memcpy=0A= > > happening from the ARC? Is there maybe a missing fast path that could= =0A= > > take fewer copies into the socket buffer?=0A= >=0A= > Hi Adam -=0A= >=0A= > Some items to look at and possibly include for more responses....=0A= >=0A= > - What is your server system? Make/model/ram/etc. What is your=0A= > overall 'top' cpu utilization 'top -aH' ...=0A= >=0A= > - It looks like you're using a 40gb/s card. Posting the output of=0A= > 'ifconfig -vm' would provide additional information.=0A= >=0A= > - Are the interfaces running cleanly? 'netstat -i' is helpful.=0A= >=0A= > - Inspect 'netstat -s'. Duplicate pkts? Resends? Out-of-order?=0A= >=0A= > - Inspect 'netstat -m'. Denied? Delayed?=0A= >=0A= >=0A= > - You mention iperf. Please post the options you used when=0A= > invoking iperf and it's output.=0A= >=0A= > - You appear to be looking for through-put vs low-latency. Have=0A= > you looked at window-size vs the amount of memory allocated to the=0A= > streams. These values vary based on the bit-rate of the connection.=0A= > Tcp connections require outstanding un-ack'd data to be held.=0A= > Effects values below.=0A= >=0A= >=0A= > - What are your values for:=0A= >=0A= > -- kern.ipc.maxsockbuf=0A= > -- net.inet.tcp.sendbuf_max=0A= > -- net.inet.tcp.recvbuf_max=0A= >=0A= > -- net.inet.tcp.sendspace=0A= > -- net.inet.tcp.recvspace=0A= >=0A= > -- net.inet.tcp.delayed_ack=0A= >=0A= > - What threads/irq are allocated to your NIC? 'vmstat -i'=0A= >=0A= > - Are the above threads floating or mapped? 'cpuset -g ...'=0A= >=0A= > - Determine best settings for LRO/TSO for your card.=0A= >=0A= > - Disable nfs tcp drc=0A= >=0A= > - What is your atime setting?=0A= >=0A= >=0A= > If you really think you have a ZFS/Kernel issue, and you're=0A= > data fits in cache, dump ZFS, create a memory backed file system=0A= > and repeat your tests. This will purge a large portion of your=0A= > graph. LRO/TSO changes may do so also.=0A= >=0A= > You also state you are using a Linux client. Are you using=0A= > the MLX affinity scripts, buffer sizing suggestions, etc, etc.=0A= > Have you swapped the Linux system for a fbsd system?=0A= >=0A= > And as a final note, I regularly use Chelsio T62100 cards=0A= > in dual home and/or LACP environments in Supermicro boxes with 100's=0A= > of nfs boot (Bhyve, QEMU, and physical system) clients per server=0A= > with no network starvation or cpu bottlenecks. Clients boot, perform=0A= > their work, and then remotely request image rollback.=0A= >=0A= >=0A= > Hopefully the above will help and provide pointers.=0A= >=0A= > Cheers=0A= >=0A= =0A=