From nobody Wed May 25 20:04:35 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id F22A11B46CFB for ; Wed, 25 May 2022 20:04:43 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-YT3-obe.outbound.protection.outlook.com (mail-yt3can01on2043.outbound.protection.outlook.com [40.107.115.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L7hqZ2sv0z3PZx; Wed, 25 May 2022 20:04:42 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Q7if4c3jsoNy5ss0rX4oH5vkmowZhQO24lFlKSaqgkTfwZwjx84UWF9dHNkr8GuHSLZc9j6JlWeVWRakiVAa8690dXqECoFs467Ht+ibrl5C1pSBnZ6/mBS314SwhutLd2kN7fCkMZkf5aAf0oGjHIz3p33sl5c9jyvoRF7f2dGeHe5e2GrNQdp4lTU68lvM9c/1Gf0w1Au5coY75DrMfJTIRsnjP3Acm/tlrhOavs4YyNH4TAiXvNYot6H7+XMAMqT3q6/6uSBShwN5GBP5SDRXg8bazMweRFu3OZvtRbo40akX48srZvl4wzYV8bNm4TJzcc36R1ltH9LQSO8Rxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=L1pZ/O1w0s6baVh+gRIPMCWzW7H6fktb7i92tA/XSoU=; b=SJ5z9OAfVKrdVSYaKBHqr5GVnIvJqMs/zNgM5MqzL+THUSDKK2+mKPVuXu/niG4S2WVmb6ZJTnglimlXTu//9VroaU/jKA5me9Po4Ion5r4AxCBxAxSFv0xNi51iP1TUEkjoYvkwJAlPtgd+e6P3ndwTormdtIwwd5qNUxlgVkSD6cOQcfiGXkacJAkp7WWVkKtA9GGae9FfFk27gOgYGi5MavJ0+umqExYbtNKum5aAuKZ+w2hIegay6Jl5q+BYCmNxnjePyTrf+A0XPFYXIXj9yLfntxCzZYfXPhAcB3C9omLSbCRlF0HTi0Ht1AA6LWhESsNuG9+9z+rBaMi5QA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L1pZ/O1w0s6baVh+gRIPMCWzW7H6fktb7i92tA/XSoU=; b=oxOziLOuj4HbdIUpSRoLtPfUphtwiCDx4i3A/lIwoTitVHiL109gf1ykCMjrTMUrSiwJ01FLX2cU9VfNrZufmC7xXvpHKlgLAuMPqRlGyfDE0CPpAtAl/Sg1gAmuJfvIFVW2nZ9F0D1JZngzNKWiRZCIbjNVr6XKL2f+F0Zxudu8ihULpXLcjBDu4sxYY0qCtU+BDToXfi4/KPq39hbdC4XoBbnjZzAYywdUXzzq5+ThSadR9h7Jed3g3WbTbM8I8QpURVVvbBNyqjvYrshSqV3bWdMy8+LMRsb5LMP7jis+hCNKbgvrhy4QioD1L60QHfd+5bYNX2KVsb5Rz679jA== Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:81::14) by YT2PR01MB10305.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:d9::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5293.13; Wed, 25 May 2022 20:04:35 +0000 Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc]) by YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc%6]) with mapi id 15.20.5293.013; Wed, 25 May 2022 20:04:35 +0000 From: Rick Macklem To: Adam Stylinski CC: John , "freebsd-fs@freebsd.org" Subject: Re: zfs/nfsd performance limiter Thread-Topic: zfs/nfsd performance limiter Thread-Index: AQHYaxQfJkENgRkWo0q5KjKiS/ePTK0q68g+gATWMbCAACBRgIAAK5pE Date: Wed, 25 May 2022 20:04:35 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 94babc4e-5543-7259-ecbc-a180a50b1a5b x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: de2bbb41-dc37-4177-a374-08da3e89ca52 x-ms-traffictypediagnostic: YT2PR01MB10305:EE_ x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: tcST2ODRIoz+CbfvckJbsVt55pab9pbiARFRQVCNsOGXO3nIdtgPjIPQozI5e/HUkt6NiohacCMMf5V5RYqptHHhrY1WfocgVgJmWOaBo/j4Ysk9lxsjaN6C1DOhDpp5wFQgsMrmhCUzFuoLbwOWcaCK0yc/Drw4bjTdomqol5GKDJHB1pQ8pFY/7l0LEeaLN0FWv2YAIFWpTFmTBgze4eB3UdGJ9319nt4UDkO2VSMHpkwKmxpZe5BFTt0KIqdiLHxauSoEUa6roPWNlAMK9q0csKGgcAvzWFj8jkvlrCuLjoOId9yQmzkC9LwXzev2dG3HoHMvcyYrTUjxPxX1HU0wNIqx0qJdgVM3D9gZgimS0X22yknH4oS/XkcJZOA6jaGn2hL6/4FyS73GsxOK6VgI/ULEccXMo0obF/a2rVBRsJgKH5HntH9o7j4EvTAmQUbv45xyuuuiSHZY9rmCdrivTFirKqso2/7gg2RsroPejF6otTpj641b8cd3bMceankxtWeTrZHlZR6ZZN2piRgbEAqY1iiutM+oC7QwG1fkJyGA59xLuvOLozb1hmeJnZC5tCbQqk9uKk0jWhi+EATJ44cLPwi5ZgV/IfjgzEDKhUOdpuHVciGt6Um7+BRS1c+tYk+pfhObSmpUzDPtepcNvh6xrsz7phr/QJ/ChvYi+yP6BaLufyMvP3BBHyOs/Id1wryXS4pi7ToSap6JLJCm39gN/W0+IzQnpk/XyHiggGLlwUHWprHZBTYintMYZnbg+jBvbCYy+MkPICvx/cqhvJO4xrKruK9epAsJ+JY= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230001)(4636009)(366004)(3480700007)(508600001)(5660300002)(8936002)(52536014)(55016003)(76116006)(186003)(33656002)(83380400001)(966005)(53546011)(316002)(71200400001)(86362001)(6506007)(9686003)(786003)(54906003)(6916009)(7696005)(2906002)(91956017)(38070700005)(64756008)(66556008)(8676002)(4326008)(122000001)(66446008)(66946007)(66476007)(38100700002);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 2 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?1H8A/XGYlTrYsxDMWFsSh7xcq4Bt6UFhgwnQ+ILkQ0UmH2+yGtDHh3paNn?= =?iso-8859-1?Q?rGSfQh9GkQZGtglCYqLNieNeSTGuKoh3yLZWh/jnsbg9YAOv1nH/5gEmuA?= =?iso-8859-1?Q?V5eXAdRqgyLttRRa5fIFDqvRvMTtrz5iSLLQKBwM6EV9t9A06grQskH8b2?= =?iso-8859-1?Q?LyHQjdCGP9vnBzFZSkuh2RwMlsipN3Sit0AH1ls5QGUQ+TzRY8eLadLbg7?= =?iso-8859-1?Q?nTPC+DyWltETHvNrbgsuEUMFdeVRr8prDxCDkzb+LOdc2j8RYxWJs4YK92?= =?iso-8859-1?Q?rmqkJky29iZhhlo1nkQIIdLZmCUqfGf7AlSA5C2UmrSlKo7MZiL/lImAWC?= =?iso-8859-1?Q?Wj1A+yGIp95lZEqdmPXe2VzlV9tFUnMAbZnUXsmkh917JfKNSdvdN5S/3L?= =?iso-8859-1?Q?45mmxOt0tGuRg9KUZB5euvV8HdkvhKde8jOG1NASah1oXlHwaYqvBXmg31?= =?iso-8859-1?Q?lB6tNJYicICa9vScIn3mtdFXTGucEfvHcm+F+9kN6CNbl4XIoWmgdBzL+J?= =?iso-8859-1?Q?mX4jVles6OdujFr3SFREncrk5jcW6LS9z818CwAk6CQe2IKy62cefFuLo3?= =?iso-8859-1?Q?z6ECdQ/ccsbm21cRarUwYXRl3yTZuWyC98Xe7u7hrJNbLxk34kHS4USMNE?= =?iso-8859-1?Q?Y5Elgn6mPpgGzjrtMSODM3vs1IqfaOAKEhXUDVzV/0tBeZUy5p7yIs0SRR?= =?iso-8859-1?Q?vH+4FJzg//0Ctm6vODfYUskHmVKut1aF0EQXII9TMv5G01ekwce0gt9C+S?= =?iso-8859-1?Q?/mW7IWwCd4zCd/MuBLbwY83zMuxxDFH+5+meQDRxBE0397Od03tbwn/P9Q?= =?iso-8859-1?Q?y9yK54hpvsXGM37KS9fAtAdQPIj0PltP2YqqfAgz5cg5cXRPMcJMTA6f7j?= =?iso-8859-1?Q?kbCaFrfkTKmWyRJdkbqlGQknuisby0M7z0UxBtosG6CIUdPI4aNpwIrl5D?= =?iso-8859-1?Q?IvpmsDo/f9lGvc9V4nIQnqd8R65Df6VKDfAoAmTGklLjMlc64pixJapyFS?= =?iso-8859-1?Q?OVEs6IAaFQt3GJQ/EiTbAvS61Gym1qUZVeEiwYDJNwbplAt9C2gr/rEFbU?= =?iso-8859-1?Q?89lyzN022G2xOU6odRjJNEMmmfJcqif1hHJV0XhFMCPHC/2kqPk8t/d6/Q?= =?iso-8859-1?Q?/YjTYHasnkx7XqGsIXdkLRYepjsBqIAWQpamUeiRIVtiHkRyfih6oWv3Kn?= =?iso-8859-1?Q?nPhCJmw9q8iVwf+LHyLOKYT4PW47Y0tOpoV1GTwUT9W1KwnpHLbJB1VvxP?= =?iso-8859-1?Q?EfpeKmZUgMhjbILo+GjhwfOmznUgOWSnD4b3XPeg2sXYS9meEhJWoM7LXP?= =?iso-8859-1?Q?r2kGxyJ2KvD7FiLDxChJovxJHHECtKaYvkEyv7Jrk1QIBOhfBSpICpNSgZ?= =?iso-8859-1?Q?IhEwX6zGNFLmPV6ol2C5scGJoXxisALC8CuXAjq3hCeV+uJIx0QSh3V+TF?= =?iso-8859-1?Q?ts27Fn8y2YfGy/INIOo3agyIuac7uxhaI61Gwb6MXQyLdnA17mlkZm4MJB?= =?iso-8859-1?Q?+JRH3AeRlk6vaGgEIJ/d6bMar1eiCMMnPwfNwXQK9N4J0RAK1ChlFezNaa?= =?iso-8859-1?Q?GilVKWG1uEWsMGnO27X7ev4/bfHzHyTcIUjhxH8uQRyKxG314lKfNWOw5m?= =?iso-8859-1?Q?wVOVWYZA8Wb1XZSn+FCKS0Ygh4yLm9KYOolMylX4Ri4lUb7Ni2YmrjSfRi?= =?iso-8859-1?Q?bhVnUVQQXriPXll24CdRNrXL4BPVXFeDj/KbQa8C8Y6ju9gg3XuXb8cwsN?= =?iso-8859-1?Q?vcX8zeDeQc6xYzn1/0QAIlWTVsCTRB5DzNS0qDwoem0lCn/56gGKMebM0C?= =?iso-8859-1?Q?uuRWEKNwsorEZ/+s1T36i6Z0LV6/iWcJF6Jy1LshF6rvryn0NAKBJYqvpP?= =?iso-8859-1?Q?v4?= x-ms-exchange-antispam-messagedata-1: HWuO2YGvIKteRw== Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: de2bbb41-dc37-4177-a374-08da3e89ca52 X-MS-Exchange-CrossTenant-originalarrivaltime: 25 May 2022 20:04:35.1889 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 620/4oOjZGC4rWGzV77K7aMZosPy2rjpnoPHvNbN4XqVCtr6ulROshtZX1H3lZCAQgQ9KE4k/4oRZx5y45tL/A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YT2PR01MB10305 X-Rspamd-Queue-Id: 4L7hqZ2sv0z3PZx X-Spamd-Bar: ----- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector2 header.b=oxOziLOu; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.115.43 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-6.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector2]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; RCVD_IN_DNSWL_NONE(0.00)[40.107.115.43:from]; NEURAL_HAM_SHORT(-1.00)[-0.996]; MLMMJ_DEST(0.00)[freebsd-fs]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.115.43:from] X-ThisMailContainsUnwantedMimeParts: N Adam Stylinski wrote:=0A= > Hmm, I don't know that the present of jumbo 9k mbufs is indicative=0A= > that the mellanox drivers are using them or not, given that I have a=0A= > link aggregation on a different (1gbps) NIC that also could be the=0A= > cause of that:=0A= >=0A= > mbuf: 256, 52231134, 49500, 25931,1956138424,=0A= 0, 0, 0=0A= > mbuf_cluster: 2048, 8161114, 2794, 4352,700435355, 0, = 0, 0=0A= > mbuf_jumbo_page: 4096, 4080557, 12288, 3977,155289291, 0, = 0, 0=0A= > mbuf_jumbo_9k: 9216, 1609044, 32772, 4174,35785053, 0, 0= , 0=0A= > mbuf_jumbo_16k: 16384, 680092, 0, 0, 0, 0, 0,= 0=0A= >=0A= > Early on, 9k MTUs did show significant advantages for throughput from=0A= > what I remember. But of course, this is before trying any of the=0A= > aforementioned changes for multiplexing the connection.=0A= I may give you better performance for your test runs, but if you fragment t= he mbuf=0A= cluster pool you can get hit pretty hard.=0A= =0A= Buyer beware, as they say, rick=0A= ps: It doesn't matter what is using them. Any mixing of 2K, 4K and 9K can r= esult=0A= in fragmentation of the pool such that an allocation cannot happen un= til=0A= mbuf clusters get free'd.=0A= =0A= On Wed, May 25, 2022 at 11:41 AM Rick Macklem wrote:= =0A= >=0A= > Adam Stylinski wrote:=0A= > [stuff snipped]=0A= >=0A= > > > ifconfig -vm=0A= > > mlxen0: flags=3D8843 metric 0 m= tu 9000=0A= > Just in case you (or someone else reading this) is not aware of it,=0A= > use of 9K jumbo clusters causes fragmentation of the memory pool=0A= > clusters are allocated from and, therefore, their use is not recommended.= =0A= >=0A= > Now, it may be that the mellanox driver doesn't use 9K clusters (it could= =0A= > put the received frame in multiple smaller clusters), but if it does, you= =0A= > should consider reducing the mtu.=0A= > If you:=0A= > # vmstat -z | fgrep mbuf_jumbo_9k=0A= > it will show you if they are being used.=0A= >=0A= > rick=0A= >=0A= >=0A= > > netstat -i=0A= > Name Mtu Network Address Ipkts Ierrs Idrop=0A= > Opkts Oerrs Coll=0A= > igb0 9000 ac:1f:6b:b0:60:bc 18230625 0 0=0A= > 24178283 0 0=0A= > igb1 9000 ac:1f:6b:b0:60:bc 14341213 0 0=0A= > 8447249 0 0=0A= > lo0 16384 lo0 367691 0 0=0A= > 367691 0 0=0A= > lo0 - localhost localhost 68 - -=0A= > 68 - -=0A= > lo0 - fe80::%lo0/64 fe80::1%lo0 0 - -=0A= > 0 - -=0A= > lo0 - your-net localhost 348944 - -=0A= > 348944 - -=0A= > mlxen 9000 00:02:c9:35:df:20 13138046 0 12=0A= > 26308206 0 0=0A= > mlxen - 10.5.5.0/24 10.5.5.1 11592389 - -=0A= > 24345184 - -=0A= > vm-pu 9000 56:3e:55:8a:2a:f8 7270 0 0=0A= > 962249 102 0=0A= > lagg0 9000 ac:1f:6b:b0:60:bc 31543941 0 0=0A= > 31623674 0 0=0A= > lagg0 - 192.168.0.0/2 nasbox 27967582 - -=0A= > 41779731 - -=0A= >=0A= > > What threads/irq are allocated to your NIC? 'vmstat -i'=0A= >=0A= > Doesn't seem perfectly balanced but not terribly imbalanced, either:=0A= >=0A= > interrupt total rate=0A= > irq9: acpi0 3 0=0A= > irq18: ehci0 ehci1+ 803162 2=0A= > cpu0:timer 67465114 167=0A= > cpu1:timer 65068819 161=0A= > cpu2:timer 65535300 163=0A= > cpu3:timer 63408731 157=0A= > cpu4:timer 63026304 156=0A= > cpu5:timer 63431412 157=0A= > irq56: nvme0:admin 18 0=0A= > irq57: nvme0:io0 544999 1=0A= > irq58: nvme0:io1 465816 1=0A= > irq59: nvme0:io2 487486 1=0A= > irq60: nvme0:io3 474616 1=0A= > irq61: nvme0:io4 452527 1=0A= > irq62: nvme0:io5 467807 1=0A= > irq63: mps0 36110415 90=0A= > irq64: mps1 112328723 279=0A= > irq65: mps2 54845974 136=0A= > irq66: mps3 50770215 126=0A= > irq68: xhci0 3122136 8=0A= > irq70: igb0:rxq0 1974562 5=0A= > irq71: igb0:rxq1 3034190 8=0A= > irq72: igb0:rxq2 28703842 71=0A= > irq73: igb0:rxq3 1126533 3=0A= > irq74: igb0:aq 7 0=0A= > irq75: igb1:rxq0 1852321 5=0A= > irq76: igb1:rxq1 2946722 7=0A= > irq77: igb1:rxq2 9602613 24=0A= > irq78: igb1:rxq3 4101258 10=0A= > irq79: igb1:aq 8 0=0A= > irq80: ahci1 37386191 93=0A= > irq81: mlx4_core0 4748775 12=0A= > irq82: mlx4_core0 13754442 34=0A= > irq83: mlx4_core0 3551629 9=0A= > irq84: mlx4_core0 2595850 6=0A= > irq85: mlx4_core0 4947424 12=0A= > Total 769135944 1908=0A= >=0A= > > Are the above threads floating or mapped? 'cpuset -g ...'=0A= >=0A= > I suspect I was supposed to run this against the argument of a pid,=0A= > maybe nfsd? Here's the output without an argument=0A= >=0A= > pid -1 mask: 0, 1, 2, 3, 4, 5=0A= > pid -1 domain policy: first-touch mask: 0=0A= >=0A= > > Disable nfs tcp drc=0A= >=0A= > This is the first I've even seen a duplicate request cache mentioned.=0A= > It seems counter-intuitive for why that'd help but maybe I'll try=0A= > doing that. What exactly is the benefit?=0A= >=0A= > > What is your atime setting?=0A= >=0A= > Disabled at both the file system and the client mounts.=0A= >=0A= > > You also state you are using a Linux client. Are you using the MLX affi= nity scripts, buffer sizing suggestions, etc, etc. Have you swapped the Lin= ux system for a fbsd system?=0A= > I've not, though I do vaguely recall mellanox supplying some scripts=0A= > in their documentation that fixed interrupt handling on specific cores=0A= > at one point. Is this what you're referring to? I could give that a=0A= > try. I don't at present have any FreeBSD client systems with enough=0A= > PCI express bandwidth to swap things out for a Linux vs FreeBSD test.=0A= >=0A= > > You mention iperf. Please post the options you used when invoking iper= f and it's output.=0A= >=0A= > Setting up the NFS client as a "server", since it seems that the=0A= > terminology is a little bit flipped with iperf, here's the output:=0A= >=0A= > -----------------------------------------------------------=0A= > Server listening on 5201 (test #1)=0A= > -----------------------------------------------------------=0A= > Accepted connection from 10.5.5.1, port 11534=0A= > [ 5] local 10.5.5.4 port 5201 connected to 10.5.5.1 port 43931=0A= > [ ID] Interval Transfer Bitrate=0A= > [ 5] 0.00-1.00 sec 3.81 GBytes 32.7 Gbits/sec=0A= > [ 5] 1.00-2.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= > [ 5] 2.00-3.00 sec 4.18 GBytes 35.9 Gbits/sec=0A= > [ 5] 3.00-4.00 sec 4.21 GBytes 36.1 Gbits/sec=0A= > [ 5] 4.00-5.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= > [ 5] 5.00-6.00 sec 4.21 GBytes 36.2 Gbits/sec=0A= > [ 5] 6.00-7.00 sec 4.10 GBytes 35.2 Gbits/sec=0A= > [ 5] 7.00-8.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= > [ 5] 8.00-9.00 sec 4.21 GBytes 36.1 Gbits/sec=0A= > [ 5] 9.00-10.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= > [ 5] 10.00-10.00 sec 7.76 MBytes 35.3 Gbits/sec=0A= > - - - - - - - - - - - - - - - - - - - - - - - - -=0A= > [ ID] Interval Transfer Bitrate=0A= > [ 5] 0.00-10.00 sec 41.5 GBytes 35.7 Gbits/sec rec= eiver=0A= > -----------------------------------------------------------=0A= > Server listening on 5201 (test #2)=0A= > -----------------------------------------------------------=0A= >=0A= > On Sun, May 22, 2022 at 3:45 AM John wrote:=0A= > >=0A= > > ----- Adam Stylinski's Original Message -----=0A= > > > Hello,=0A= > > >=0A= > > > I have two systems connected via ConnectX-3 mellanox cards in etherne= t=0A= > > > mode. They have their MTU's maxed at 9000, their ring buffers maxed= =0A= > > > at 8192, and I can hit around 36 gbps with iperf.=0A= > > >=0A= > > > When using an NFS client (client =3D linux, server =3D freebsd), I se= e a=0A= > > > maximum rate of around 20gbps. The test file is fully in ARC. The= =0A= > > > test is performed with an NFS mount nconnect=3D4 and an rsize/wsize o= f=0A= > > > 1MB.=0A= > > >=0A= > > > Here's the flame graph of the kernel of the system in question, with= =0A= > > > idle stacks removed:=0A= > > >=0A= > > > https://gist.github.com/KungFuJesus/918c6dcf40ae07767d5382deafab3a52#= file-nfs_fg-svg=0A= > > >=0A= > > > The longest functions seems like maybe it's the ERMS aware memcpy=0A= > > > happening from the ARC? Is there maybe a missing fast path that coul= d=0A= > > > take fewer copies into the socket buffer?=0A= > >=0A= > > Hi Adam -=0A= > >=0A= > > Some items to look at and possibly include for more responses....=0A= > >=0A= > > - What is your server system? Make/model/ram/etc. What is your=0A= > > overall 'top' cpu utilization 'top -aH' ...=0A= > >=0A= > > - It looks like you're using a 40gb/s card. Posting the output of=0A= > > 'ifconfig -vm' would provide additional information.=0A= > >=0A= > > - Are the interfaces running cleanly? 'netstat -i' is helpful.=0A= > >=0A= > > - Inspect 'netstat -s'. Duplicate pkts? Resends? Out-of-order?=0A= > >=0A= > > - Inspect 'netstat -m'. Denied? Delayed?=0A= > >=0A= > >=0A= > > - You mention iperf. Please post the options you used when=0A= > > invoking iperf and it's output.=0A= > >=0A= > > - You appear to be looking for through-put vs low-latency. Have=0A= > > you looked at window-size vs the amount of memory allocated to the=0A= > > streams. These values vary based on the bit-rate of the connection.= =0A= > > Tcp connections require outstanding un-ack'd data to be held.=0A= > > Effects values below.=0A= > >=0A= > >=0A= > > - What are your values for:=0A= > >=0A= > > -- kern.ipc.maxsockbuf=0A= > > -- net.inet.tcp.sendbuf_max=0A= > > -- net.inet.tcp.recvbuf_max=0A= > >=0A= > > -- net.inet.tcp.sendspace=0A= > > -- net.inet.tcp.recvspace=0A= > >=0A= > > -- net.inet.tcp.delayed_ack=0A= > >=0A= > > - What threads/irq are allocated to your NIC? 'vmstat -i'=0A= > >=0A= > > - Are the above threads floating or mapped? 'cpuset -g ...'=0A= > >=0A= > > - Determine best settings for LRO/TSO for your card.=0A= > >=0A= > > - Disable nfs tcp drc=0A= > >=0A= > > - What is your atime setting?=0A= > >=0A= > >=0A= > > If you really think you have a ZFS/Kernel issue, and you're=0A= > > data fits in cache, dump ZFS, create a memory backed file system=0A= > > and repeat your tests. This will purge a large portion of your=0A= > > graph. LRO/TSO changes may do so also.=0A= > >=0A= > > You also state you are using a Linux client. Are you using=0A= > > the MLX affinity scripts, buffer sizing suggestions, etc, etc.=0A= > > Have you swapped the Linux system for a fbsd system?=0A= > >=0A= > > And as a final note, I regularly use Chelsio T62100 cards=0A= > > in dual home and/or LACP environments in Supermicro boxes with 100's=0A= > > of nfs boot (Bhyve, QEMU, and physical system) clients per server=0A= > > with no network starvation or cpu bottlenecks. Clients boot, perform= =0A= > > their work, and then remotely request image rollback.=0A= > >=0A= > >=0A= > > Hopefully the above will help and provide pointers.=0A= > >=0A= > > Cheers=0A= > >=0A= >=0A=