From nobody Sun May 22 14:12:46 2022 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 16B261AEFC0E for ; Sun, 22 May 2022 14:13:01 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from CAN01-QB1-obe.outbound.protection.outlook.com (mail-qb1can01on2043.outbound.protection.outlook.com [40.107.66.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mail.protection.outlook.com", Issuer "DigiCert Cloud Services CA-1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4L5j9737yZz4YJJ; Sun, 22 May 2022 14:12:59 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZbM4ise+KjvbjopOceUBGsovrB3KARG1pOn13R3ho/6hodi3uaYucMIswj1fsEOTYI6Fa6i0elHM5ICS5Ut7IkAJwGC0ICH1mh38mgoWOfP96YL2HxeRgjdpkA9yIKPpe4PdHqBby9SWtH4cK+e64MlpkymkGKuzxf521Y2gVmQmWGpf0rMv5lUvVEJng+h1qpfuUGwtNJXtTnh1WyP9T0wdLCmZ+iJosdaiW/7qmnWL9G3IrC4mFafvOFz0LjM5ruUHDRCUIV57UFQBjwD+uD3NH8QDs20ttfDUr3fTdkjKlESVozHwqvY4eVANvGKnkONl+gc0VN8H/sX8s5yOLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UI4halNCCIOTp+yj2OqmGtPvASdYQLDyHqmTxpQ1z1o=; b=jTVim3HjB5HOJRNf/Uk4aStku7zw/2e82c5r7edW2RaTgTkUgxF1P2OU63FSGMmQBNbiU3ECHsS4FsfnP90BeuLWtrOfi4peywbXNLkSY2Bi1Ghn/pxvzwNnMNPpARc7zQZDke3wWv+wYDrsfCEvr+IB5oQA9CWgZ9AKieWmxmjQaiOM80ZJ12rI9cr/RExdJZj68Z+x+iP/zrZpN+ScRnGhk3a96PYPpOM7INjPFQFvl3VC9c4tvATjjBJns+9BnxGl6gWOP1NQ8m4k6lAuc5kpcuC2E68/QwoByXhL1QDspteFeWP2rFoipo72mVIZBfm3wRJoO7+o3xeoS/DI9Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uoguelph.ca; dmarc=pass action=none header.from=uoguelph.ca; dkim=pass header.d=uoguelph.ca; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=uoguelph.ca; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UI4halNCCIOTp+yj2OqmGtPvASdYQLDyHqmTxpQ1z1o=; b=RjW1NrVpsqDYOGlh8nQSV+hYZAKJTKikKAZrpMGs0evGQBI8oaY+Ip9Cz8K/zkyokuYMuEcM/LeqV6bGlhGkhdiYV7GbDhCmY59qMw24/+veeWtt8Jo8nZISYxYP4iauxcchMO7e0798T1iwZot25FaGkcCuOoJROZUVmIAlH9m1hvn1uM80QkDIA15RGQanaM0tb5pjf95V1Zs1Mn5ofQI0PHDX/RzCMTzWaxNtZgfggyd1sBkxnHe7qYsSVW8VP4f//EYpO74rl2O1I3uqplaiYEt+Nlzg2V91Q2psMBDjwmf79FEs8fbiG+1dyQqhL18noFj2F6TFBqV3d6fTeA== Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:c01:81::14) by YT3PR01MB8561.CANPRD01.PROD.OUTLOOK.COM (2603:10b6:b01:9e::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5273.20; Sun, 22 May 2022 14:12:51 +0000 Received: from YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc]) by YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM ([fe80::b921:251e:4a0b:54fc%6]) with mapi id 15.20.5273.022; Sun, 22 May 2022 14:12:46 +0000 From: Rick Macklem To: Adam Stylinski , John CC: "freebsd-fs@freebsd.org" Subject: Re: zfs/nfsd performance limiter Thread-Topic: zfs/nfsd performance limiter Thread-Index: AQHYaxQfJkENgRkWo0q5KjKiS/ePTK0q68g+gAADxyc= Date: Sun, 22 May 2022 14:12:46 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 2d92e1dc-73d2-e8e3-d262-8680d99989e9 x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: a6059930-a86b-499d-b21d-08da3bfd2584 x-ms-traffictypediagnostic: YT3PR01MB8561:EE_ x-microsoft-antispam-prvs: x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: lQ2MebsyBRL62HLi6Ntl4bsI6BN1oQN5VEBnX1Zx9x8mz80sWQ2UVLW48y/HyAvCwS0hXptkrs7FE+mdCIEKhy+6IFzmhVtGs314eWj8yMIT3c8QGccDmrvrTGkLqMW8AAcgHzmzYBk7TXQe7OYM4xR/fESbl9S31huqpwbqH9FDOo9f46QdDTr/XqYFp/wD2E7elo5NVLLRhCCKeCQ+P4IW/382ql1k08NyMQsx32K4NqdZwk5F0OuCUlrSt707zYmMkwVFaPe0xMB32cAl+JxrKWYaTDM1cDNzUHfPG8WJyqNqajTO54sn7BYqLPvrV8MlEMJx3HEso5ZlCGhTd4KD4M1DjIXjI7RpfY072OMRdHj/mrzMJO9Yg1Y/+W8+YStI+nCVD7e6bHqtqVgnXQ5G4QdU73fU0VKRP3m9B1b2O8nVweeB4y7OyJVK+37vN5ziuEyihCEMDmupQfKPsOBZ6nn+of/E6BSbZZLPmG6gwH1KcLWsZcDaoMJbQhFsVXVKC/7fEdlN4KvsOBhVhsrm1UhGpPVx7usuaQcqjs0cIBamY2wfLDg2Y9bJ5ReZNYCD0aG4Pol1BB8UmBdSM3j3re9rYmgK2hwboOPrAQV9m//+RRgISkF+I2XOI8rwgXUba3Lg02Nl6ysK9gQaLzYrcRLhMDcfiyLyOBiVysNbZeoUYZZolQCjg315SB3bnJIYkJheoj5PBAdHCn59g5mFt4BpLTqJctNJzQvVEDJh6FsdjzEeR6Wfi/zksXf7a5urBkYVqv9vL0Jz4n+xLJ92gUof6I+738i4zjmDPHk= x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM;PTR:;CAT:NONE;SFS:(13230001)(4636009)(366004)(3480700007)(122000001)(186003)(83380400001)(8676002)(4326008)(55016003)(316002)(110136005)(71200400001)(64756008)(66946007)(66476007)(66446008)(66556008)(38100700002)(38070700005)(91956017)(76116006)(786003)(86362001)(8936002)(52536014)(508600001)(966005)(7696005)(2906002)(6506007)(9686003)(33656002)(30864003)(53546011)(5660300002);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 2 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?IYsVZOfJcOcN7/qz0CmZ4ohEZZEUa0wZ6XBT/UxsdKrK/ORM8bzI3qg99V?= =?iso-8859-1?Q?YPJwh7QsrLziYTHUJMAMMRFKu3ccngmBqD4AYmJ4TxrJr2wSdUCj9dTdU7?= =?iso-8859-1?Q?TeP4jLz7GK/i7Hrg1mJwZcIMgzh9mSvf1dcQrtl/2AA64y6g1d+dKUtjni?= =?iso-8859-1?Q?HyelmEo1IP64J1dgYpHt0rZnwNbVlrXMNFjCUtJPmMzVcbJuVq04Qm2B9V?= =?iso-8859-1?Q?hP/SXoRE/IZSSzyXadj07FXDkAjKIuDuvAprWIp/mf6sYhr+zDj8xxSILC?= =?iso-8859-1?Q?7+SGu0AP5NFa8cr+DBOU9C3/4lOOs6KW4KD9ryk5Q6nzZVX2djjOiSAS7d?= =?iso-8859-1?Q?fZ/gh2WcIgP1RgQpOoqfz1HhwD+x7M6y7PgzVSJH3sfvnqJW9PGK+u20t5?= =?iso-8859-1?Q?ig6MMjVbQD7oeRW//9pKKTfeThZvVybrvJaVL89YWTwb5WU1e7xDbFY6QA?= =?iso-8859-1?Q?F9WS9VHUwonSOavOT9u9cIT4Rq8bmLbx8jTq1548lrRytg1wkNBN03qYau?= =?iso-8859-1?Q?kCEmYi5d6hUgpll8lzuaMtRK4/5ljgZiCYs8hdjA4ZQ2Z9Uv/KyvvFMaMH?= =?iso-8859-1?Q?butfq/v6LED4KByVew/ftv4WPib6LrZO/VuDEdsmBXE7odX8Rs4Bkj2wjg?= =?iso-8859-1?Q?VzC13+NXBQ6DbumTo3xJrQr9EMoYJMtdKp2o+x+N/S4o5B/ivOhg5ia5zO?= =?iso-8859-1?Q?Zfvjkx79db/MaE5IUMBmCf0Po164Bic4/Z00sQ4WFqe66KYAQm2Tcoe+M0?= =?iso-8859-1?Q?S2OJTgpA7yiwQDpOrzYEWsey6eVvEUEilbUrx/rl4u5OHl8D63mJjuhzBK?= =?iso-8859-1?Q?H9qzMddg3Wy17P+iMSw+cCDC2VvH0pVa9w9f/3t4Bx+omdc8MVO7kRvKXy?= =?iso-8859-1?Q?io2LynLTLWuLU+EBEw/eWYcOjIGy9/viUJZsmfQoMWZlUAQ8ebr3SJE9xU?= =?iso-8859-1?Q?iG8j3SxYLx5ReBbMgE0VBy6MC4Tgi95X66HfO59Hh996SQ2vvQFqgWQK+I?= =?iso-8859-1?Q?q+kzL/ZiotIPaDKXAZp3gqn1vswI9KDTJ09HKLetKs3Csr9J83PPaj9OB+?= =?iso-8859-1?Q?pZJ8arrroVBqKIwc8zc3yE1oldPP6T3cM0H0LZKSSuTRSUvfHl5nhbJ9xK?= =?iso-8859-1?Q?lIsKpsaw0qXkuFcx/M1p7g+pPM/Y9+FEEH9kjk5O2EpJh+vWRPJU73Bu50?= =?iso-8859-1?Q?RKR70JbVIB/ZXRSYI4CLnRl3mF2/IruIxKO++f2iMCP4LPD/PWCxmrwpaL?= =?iso-8859-1?Q?1Y3rLhVtypnH7jkCU4w7gJu7BqZYUMCNnHKUY0m9mOm5ZoLSJQ+MkDWJTd?= =?iso-8859-1?Q?+gbehlyPmplynGxzPQ+Ol4ULWv3Vw5qh8+qkUrW86OJxYKecm5ZfNGVCN8?= =?iso-8859-1?Q?Mw+8VzeNfU+yWQzJoBkKD7uR9xR7Xrgcv25Thw9PP3HNTgmpzuM5LkMfBD?= =?iso-8859-1?Q?FkpPJiTAieC9sHOEnxhtX0A7V+sNXeL8k/GqtyGVtc3+TU2XU8HiAFz4z4?= =?iso-8859-1?Q?OyBQjiNK6BuxQSysRc96CTF86qO8HRQWYT81lSJt4TOvscvu+oUKJk/miK?= =?iso-8859-1?Q?+WgI2NuPlsuyYfDD8HVrujjZOAk/INM4oLg/3GTK0cs3REpURs7B4gK2O7?= =?iso-8859-1?Q?ht9vv9IsEISYk0BcqMgZhpGgzb8UT+nml69kDZkDW5DDuUYYubiXdt4FF8?= =?iso-8859-1?Q?0D9MqWqmJVcUynNWimOjhtoETBK/EcjlkEwndxtmGn84sfclsXOq2n3/Co?= =?iso-8859-1?Q?YMgFef+BLtswoUEDPI9CVNtWdrdwEmxqUaKzPbMXMtz95J+N4mUphN3/9t?= =?iso-8859-1?Q?F6kwra1KuFypufRqxqZJ12tJnsnak8O5s7MQZF6y6OLxTvEbBXibyJxlx4?= =?iso-8859-1?Q?mA?= x-ms-exchange-antispam-messagedata-1: fv0FwOxS1CJFog== Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org MIME-Version: 1.0 X-OriginatorOrg: uoguelph.ca X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: YQBPR0101MB9742.CANPRD01.PROD.OUTLOOK.COM X-MS-Exchange-CrossTenant-Network-Message-Id: a6059930-a86b-499d-b21d-08da3bfd2584 X-MS-Exchange-CrossTenant-originalarrivaltime: 22 May 2022 14:12:46.8252 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: be62a12b-2cad-49a1-a5fa-85f4f3156a7d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: jfye67VDSra7Mh4tj5hC99ptM0chKcxYqoB9gBxZZ51lGNScDEWfD3hUM1jFhtgqj9fYxzXPhz53FGGExmL7+A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: YT3PR01MB8561 X-Rspamd-Queue-Id: 4L5j9737yZz4YJJ X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=uoguelph.ca header.s=selector2 header.b=RjW1NrVp; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=uoguelph.ca; spf=pass (mx1.freebsd.org: domain of rmacklem@uoguelph.ca designates 40.107.66.43 as permitted sender) smtp.mailfrom=rmacklem@uoguelph.ca X-Spamd-Result: default: False [-4.65 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; R_DKIM_ALLOW(-0.20)[uoguelph.ca:s=selector2]; FREEFALL_USER(0.00)[rmacklem]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:40.107.0.0/16]; MIME_GOOD(-0.10)[text/plain]; NEURAL_SPAM_SHORT(0.35)[0.354]; NEURAL_HAM_LONG(-1.00)[-1.000]; DWL_DNSWL_LOW(-1.00)[uoguelph.ca:dkim]; RCVD_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DKIM_TRACE(0.00)[uoguelph.ca:+]; DMARC_POLICY_ALLOW(-0.50)[uoguelph.ca,none]; RCVD_IN_DNSWL_NONE(0.00)[40.107.66.43:from]; MLMMJ_DEST(0.00)[freebsd-fs]; FREEMAIL_TO(0.00)[gmail.com,freebsd.org]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:8075, ipnet:40.104.0.0/14, country:US]; ARC_ALLOW(-1.00)[microsoft.com:s=arcselector9901:i=1]; RWL_MAILSPIKE_POSSIBLE(0.00)[40.107.66.43:from] X-ThisMailContainsUnwantedMimeParts: N Adam Stylinski wrote:=0A= > jwd wrote:=0A= > > What is your server system? Make/model/ram/etc.=0A= > Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz (6 cores, a little starved=0A= > on the clock but the load at least is basically zero during this test)=0A= > 128GB of memory=0A= >=0A= > > top -aH=0A= > During the copy load (for brevity, only did the real top contenders=0A= > for CPU here):=0A= >=0A= > last pid: 15560; load averages: 0.25, 0.39, 0.27=0A= >=0A= >=0A= >=0A= > up 4+15:48:54=0A= > 09:17:38=0A= > 98 threads: 2 running, 96 sleeping=0A= > CPU: 0.0% user, 0.0% nice, 19.1% system, 5.6% interrupt, 75.3% idle=0A= > Mem: 12M Active, 4405M Inact, 8284K Laundry, 115G Wired, 1148M Buf, 4819M= Free=0A= > ARC: 98G Total, 80G MFU, 15G MRU, 772K Anon, 1235M Header, 1042M Other=0A= > 91G Compressed, 189G Uncompressed, 2.09:1 Ratio=0A= > Swap: 5120M Total, 5120M Free=0A= >=0A= > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND= =0A= > 3830 root 20 0 12M 2700K rpcsvc 2 1:16 53.26%=0A= > nfsd: server (nfsd){nfsd: service}=0A= > 3830 root 20 0 12M 2700K CPU5 5 5:42 52.96%=0A= > nfsd: server (nfsd){nfsd: master}=0A= > 15560 adam 20 0 17M 5176K CPU2 2 0:00 0.12% top -a= H=0A= > 1493 root 20 0 13M 2260K select 3 0:36 0.01%=0A= > /usr/sbin/powerd=0A= > 1444 root 20 0 75M 2964K select 5 0:19 0.01%=0A= > /usr/sbin/mountd -r /etc/exports /etc/zfs/exports=0A= > 1215 uucp 20 0 13M 2820K select 5 0:27 0.01%=0A= > /usr/local/libexec/nut/usbhid-ups -a cyberpower=0A= > 93424 adam 20 0 21M 9900K select 0 0:00 0.01%=0A= > sshd: adam@pts/0 (sshd)=0A= >=0A= > > ifconfig -vm=0A= > mlxen0: flags=3D8843 metric 0 mtu= 9000=0A= options=3Ded07bb= =0A= capabilities=3Ded07bb=0A= > ether 00:02:c9:35:df:20=0A= > inet 10.5.5.1 netmask 0xffffff00 broadcast 10.5.5.255=0A= > media: Ethernet autoselect (40Gbase-CR4 )=0A= > status: active=0A= > supported media:=0A= > media autoselect=0A= > media 40Gbase-CR4 mediaopt full-duplex=0A= > media 10Gbase-CX4 mediaopt full-duplex=0A= > media 10Gbase-SR mediaopt full-duplex=0A= > media 1000baseT mediaopt full-duplex=0A= > nd6 options=3D29=0A= > plugged: QSFP+ 40GBASE-CR4 (No separable connector)=0A= > vendor: Mellanox PN: MC2207130-002 SN: MT1419VS07971 DATE: 2014-06-06=0A= > module temperature: 0.00 C voltage: 0.00 Volts=0A= > lane 1: RX power: 0.00 mW (-inf dBm) TX bias: 0.00 mA=0A= > lane 2: RX power: 0.00 mW (-inf dBm) TX bias: 0.00 mA=0A= > lane 3: RX power: 0.00 mW (-inf dBm) TX bias: 0.00 mA=0A= > lane 4: RX power: 0.00 mW (-inf dBm) TX bias: 0.00 mA=0A= >=0A= > > - What are your values for:=0A= > >=0A= > > -- kern.ipc.maxsockbuf=0A= > > -- net.inet.tcp.sendbuf_max=0A= > > -- net.inet.tcp.recvbuf_max=0A= > >=0A= > > -- net.inet.tcp.sendspace=0A= > > -- net.inet.tcp.recvspace=0A= > >=0A= > > -- net.inet.tcp.delayed_ack=0A= > kern.ipc.maxsockbuf: 16777216=0A= > net.inet.tcp.sendbuf_max: 16777216=0A= > net.inet.tcp.recvbuf_max: 16777216=0A= > net.inet.tcp.sendspace: 32768 # This is interesting? I'm not sure why=0A= > the discrepancy here=0A= > net.inet.tcp.recvspace: 4194304=0A= > net.inet.tcp.delayed_ack: 0=0A= >=0A= > > netstat -i=0A= > Name Mtu Network Address Ipkts Ierrs Idrop=0A= Opkts Oerrs Coll=0A= > igb0 9000 ac:1f:6b:b0:60:bc 18230625 0 0=0A= 24178283 0 0=0A= > igb1 9000 ac:1f:6b:b0:60:bc 14341213 0 0=0A= 8447249 0 0=0A= > lo0 16384 lo0 367691 0 0=0A= 367691 0 0=0A= > lo0 - localhost localhost 68 - -=0A= 68 - -=0A= > lo0 - fe80::%lo0/64 fe80::1%lo0 0 - -=0A= 0 - -=0A= > lo0 - your-net localhost 348944 - -=0A= 348944 - -=0A= > mlxen 9000 00:02:c9:35:df:20 13138046 0 12=0A= 26308206 0 0=0A= > mlxen - 10.5.5.0/24 10.5.5.1 11592389 - -=0A= 24345184 - -=0A= > vm-pu 9000 56:3e:55:8a:2a:f8 7270 0 0=0A= 962249 102 0=0A= > lagg0 9000 ac:1f:6b:b0:60:bc 31543941 0 0=0A= 31623674 0 0=0A= > lagg0 - 192.168.0.0/2 nasbox 27967582 - -=0A= 41779731 - -=0A= > =0A= > > What threads/irq are allocated to your NIC? 'vmstat -i'=0A= >=0A= > Doesn't seem perfectly balanced but not terribly imbalanced, either:=0A= >=0A= > interrupt total rate=0A= > irq9: acpi0 3 0=0A= > irq18: ehci0 ehci1+ 803162 2=0A= > cpu0:timer 67465114 167=0A= > cpu1:timer 65068819 161=0A= > cpu2:timer 65535300 163=0A= > cpu3:timer 63408731 157=0A= > cpu4:timer 63026304 156=0A= > cpu5:timer 63431412 157=0A= > irq56: nvme0:admin 18 0=0A= > irq57: nvme0:io0 544999 1=0A= > irq58: nvme0:io1 465816 1=0A= > irq59: nvme0:io2 487486 1=0A= > irq60: nvme0:io3 474616 1=0A= > irq61: nvme0:io4 452527 1=0A= > irq62: nvme0:io5 467807 1=0A= > irq63: mps0 36110415 90=0A= > irq64: mps1 112328723 279=0A= > irq65: mps2 54845974 136=0A= > irq66: mps3 50770215 126=0A= > irq68: xhci0 3122136 8=0A= > irq70: igb0:rxq0 1974562 5=0A= > irq71: igb0:rxq1 3034190 8=0A= > irq72: igb0:rxq2 28703842 71=0A= > irq73: igb0:rxq3 1126533 3=0A= > irq74: igb0:aq 7 0=0A= > irq75: igb1:rxq0 1852321 5=0A= > irq76: igb1:rxq1 2946722 7=0A= > irq77: igb1:rxq2 9602613 24=0A= > irq78: igb1:rxq3 4101258 10=0A= > irq79: igb1:aq 8 0=0A= > irq80: ahci1 37386191 93=0A= > irq81: mlx4_core0 4748775 12=0A= > irq82: mlx4_core0 13754442 34=0A= > irq83: mlx4_core0 3551629 9=0A= > irq84: mlx4_core0 2595850 6=0A= > irq85: mlx4_core0 4947424 12=0A= > Total 769135944 1908=0A= >=0A= > > Are the above threads floating or mapped? 'cpuset -g ...'=0A= >=0A= > I suspect I was supposed to run this against the argument of a pid,=0A= > maybe nfsd? Here's the output without an argument=0A= >=0A= > pid -1 mask: 0, 1, 2, 3, 4, 5=0A= > pid -1 domain policy: first-touch mask: 0=0A= >=0A= > > Disable nfs tcp drc=0A= >=0A= > This is the first I've even seen a duplicate request cache mentioned.=0A= > It seems counter-intuitive for why that'd help but maybe I'll try=0A= > doing that. What exactly is the benefit?=0A= The DRC improves correctness for NFSv3 and NFSv4.0 mounts. It is a=0A= performance hit. However, for a read mostly load it won't add too=0A= much overhead. Turning it off increases the likelyhood of data corruption= =0A= due to retried non-idempotent RPCs, but the failure will be rare over TCP.= =0A= =0A= If your mount is NFSv4.1 or 4.2, the DRC is not used, so don't worry about = it.=0A= =0A= > > What is your atime setting?=0A= >=0A= > Disabled at both the file system and the client mounts.=0A= >=0A= > > You also state you are using a Linux client. Are you using the MLX affi= nity > scripts, buffer sizing suggestions, etc, etc. Have you swapped the L= inux system for a fbsd system?=0A= > I've not, though I do vaguely recall mellanox supplying some scripts=0A= > in their documentation that fixed interrupt handling on specific cores=0A= > at one point. Is this what you're referring to? I could give that a=0A= > try. I don't at present have any FreeBSD client systems with enough=0A= > PCI express bandwidth to swap things out for a Linux vs FreeBSD test.=0A= If you have not already done so, do a "nfsstat -m" on the client to find=0A= out what options it is actually using (works on both Linux and FreeBSD).=0A= =0A= If the Linux client has a way of manually adjusting readahead, then try=0A= increasing it. (FreeBSD has a "readahead" mount option, but I can't recall= =0A= if Linux has one?)=0A= =0A= You can try mounting the server on the server, but that will use lo0 and no= t=0A= the mellanox, so it might be irrelevant.=0A= =0A= Also, I don't know how many queues the mellanox driver used. You'd want=0A= an "nconnect" at least as high as the number of queues, since each TCP=0A= connection will be serviced by one queue and that limits its bandwidth.=0A= =0A= However, in general, RPC RTT will define how well NFS performs and not=0A= the I/O rate for a bulk file read/write.=0A= Btw, writing is a very different story than reading, largely due to the nee= d=0A= to commit data/metadata to stable storage while writing.=0A= =0A= I can't help w.r.t. ZFS nor high performance nets (my fastest is 1Gbps), ri= ck=0A= =0A= > You mention iperf. Please post the options you used when invoking iperf = and it's output.=0A= =0A= Setting up the NFS client as a "server", since it seems that the=0A= terminology is a little bit flipped with iperf, here's the output:=0A= =0A= -----------------------------------------------------------=0A= Server listening on 5201 (test #1)=0A= -----------------------------------------------------------=0A= Accepted connection from 10.5.5.1, port 11534=0A= [ 5] local 10.5.5.4 port 5201 connected to 10.5.5.1 port 43931=0A= [ ID] Interval Transfer Bitrate=0A= [ 5] 0.00-1.00 sec 3.81 GBytes 32.7 Gbits/sec=0A= [ 5] 1.00-2.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 2.00-3.00 sec 4.18 GBytes 35.9 Gbits/sec=0A= [ 5] 3.00-4.00 sec 4.21 GBytes 36.1 Gbits/sec=0A= [ 5] 4.00-5.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 5.00-6.00 sec 4.21 GBytes 36.2 Gbits/sec=0A= [ 5] 6.00-7.00 sec 4.10 GBytes 35.2 Gbits/sec=0A= [ 5] 7.00-8.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 8.00-9.00 sec 4.21 GBytes 36.1 Gbits/sec=0A= [ 5] 9.00-10.00 sec 4.20 GBytes 36.1 Gbits/sec=0A= [ 5] 10.00-10.00 sec 7.76 MBytes 35.3 Gbits/sec=0A= - - - - - - - - - - - - - - - - - - - - - - - - -=0A= [ ID] Interval Transfer Bitrate=0A= [ 5] 0.00-10.00 sec 41.5 GBytes 35.7 Gbits/sec recei= ver=0A= -----------------------------------------------------------=0A= Server listening on 5201 (test #2)=0A= -----------------------------------------------------------=0A= =0A= On Sun, May 22, 2022 at 3:45 AM John wrote:=0A= >=0A= > ----- Adam Stylinski's Original Message -----=0A= > > Hello,=0A= > >=0A= > > I have two systems connected via ConnectX-3 mellanox cards in ethernet= =0A= > > mode. They have their MTU's maxed at 9000, their ring buffers maxed=0A= > > at 8192, and I can hit around 36 gbps with iperf.=0A= > >=0A= > > When using an NFS client (client =3D linux, server =3D freebsd), I see = a=0A= > > maximum rate of around 20gbps. The test file is fully in ARC. The=0A= > > test is performed with an NFS mount nconnect=3D4 and an rsize/wsize of= =0A= > > 1MB.=0A= > >=0A= > > Here's the flame graph of the kernel of the system in question, with=0A= > > idle stacks removed:=0A= > >=0A= > > https://gist.github.com/KungFuJesus/918c6dcf40ae07767d5382deafab3a52#fi= le-nfs_fg-svg=0A= > >=0A= > > The longest functions seems like maybe it's the ERMS aware memcpy=0A= > > happening from the ARC? Is there maybe a missing fast path that could= =0A= > > take fewer copies into the socket buffer?=0A= >=0A= > Hi Adam -=0A= >=0A= > Some items to look at and possibly include for more responses....=0A= >=0A= > - What is your server system? Make/model/ram/etc. What is your=0A= > overall 'top' cpu utilization 'top -aH' ...=0A= >=0A= > - It looks like you're using a 40gb/s card. Posting the output of=0A= > 'ifconfig -vm' would provide additional information.=0A= >=0A= > - Are the interfaces running cleanly? 'netstat -i' is helpful.=0A= >=0A= > - Inspect 'netstat -s'. Duplicate pkts? Resends? Out-of-order?=0A= >=0A= > - Inspect 'netstat -m'. Denied? Delayed?=0A= >=0A= >=0A= > - You mention iperf. Please post the options you used when=0A= > invoking iperf and it's output.=0A= >=0A= > - You appear to be looking for through-put vs low-latency. Have=0A= > you looked at window-size vs the amount of memory allocated to the=0A= > streams. These values vary based on the bit-rate of the connection.=0A= > Tcp connections require outstanding un-ack'd data to be held.=0A= > Effects values below.=0A= >=0A= >=0A= > - What are your values for:=0A= >=0A= > -- kern.ipc.maxsockbuf=0A= > -- net.inet.tcp.sendbuf_max=0A= > -- net.inet.tcp.recvbuf_max=0A= >=0A= > -- net.inet.tcp.sendspace=0A= > -- net.inet.tcp.recvspace=0A= >=0A= > -- net.inet.tcp.delayed_ack=0A= >=0A= > - What threads/irq are allocated to your NIC? 'vmstat -i'=0A= >=0A= > - Are the above threads floating or mapped? 'cpuset -g ...'=0A= >=0A= > - Determine best settings for LRO/TSO for your card.=0A= >=0A= > - Disable nfs tcp drc=0A= >=0A= > - What is your atime setting?=0A= >=0A= >=0A= > If you really think you have a ZFS/Kernel issue, and you're=0A= > data fits in cache, dump ZFS, create a memory backed file system=0A= > and repeat your tests. This will purge a large portion of your=0A= > graph. LRO/TSO changes may do so also.=0A= >=0A= > You also state you are using a Linux client. Are you using=0A= > the MLX affinity scripts, buffer sizing suggestions, etc, etc.=0A= > Have you swapped the Linux system for a fbsd system?=0A= >=0A= > And as a final note, I regularly use Chelsio T62100 cards=0A= > in dual home and/or LACP environments in Supermicro boxes with 100's=0A= > of nfs boot (Bhyve, QEMU, and physical system) clients per server=0A= > with no network starvation or cpu bottlenecks. Clients boot, perform=0A= > their work, and then remotely request image rollback.=0A= >=0A= >=0A= > Hopefully the above will help and provide pointers.=0A= >=0A= > Cheers=0A= >=0A= =0A=