From nobody Tue Jun 29 00:44:03 2021 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id F256A11D8064 for ; Tue, 29 Jun 2021 00:44:15 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-oo1-f47.google.com (mail-oo1-f47.google.com [209.85.161.47]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1O1" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4GDQhv6CKYz4s8L for ; Tue, 29 Jun 2021 00:44:15 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-oo1-f47.google.com with SMTP id 128-20020a4a11860000b029024b19a4d98eso5251077ooc.5 for ; Mon, 28 Jun 2021 17:44:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=VdxZtIGQftjH5N8xBngZx/Ty/sHrsdJImjgY/cGjR1E=; b=uZ4Y/gyCvhSZzUtejPVh1ipWsF5StNPUxuYTnDk8OTUe6U468dO7JEmwAn1QvYv3IA Wy51TW23A+rqs6DRXEgrZ538zBUlbtiQNAj+OxtD8YqbYszcSwVFMX+DiXpVPaxtWiN3 lY8+YfSGRpjisY3x/xROYaV0Y6oVkOo4ok0s3/WaiAqEtwUQrfUrTr5BNpwC64fdiN3z 0daGqIuVun65bZvdRFTbWW1l4YvPShuTFaUmjMGcBvAxln+sF0iNyv1TYUhsVpQwwZ7N jLYYWrZD0OQbRvKEugkf60YMEh+WCpMYasSS1jKwZ+JDklttYSK551FhNNs4Vd6YE8FU EGlA== X-Gm-Message-State: AOAM533VKpahq7vSr9Wrh34e5m1F4BO1PL0Pv8lDk9vV9I7E0eSWYCgd pNcpMrqI7Bu2MeXVchLCZCl+UfVuaRJ3w0zBcC0= X-Google-Smtp-Source: ABdhPJwz6drzDtdlMjjU9sgVpHZFpPTRS6bviFph0SdvXReNLnAhPZqJglLF1dCsj+moQjnVeWoUWzgajzGH2UxPfDc= X-Received: by 2002:a4a:b481:: with SMTP id b1mr1711728ooo.79.1624927454766; Mon, 28 Jun 2021 17:44:14 -0700 (PDT) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: In-Reply-To: From: Alan Somers Date: Mon, 28 Jun 2021 18:44:03 -0600 Message-ID: Subject: Re: RFC: NFS trunking (multiple TCP connections for a mount To: Rick Macklem Cc: freebsd-net Content-Type: multipart/alternative; boundary="000000000000ba2a2805c5dce782" X-Rspamd-Queue-Id: 4GDQhv6CKYz4s8L X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-ThisMailContainsUnwantedMimeParts: Y --000000000000ba2a2805c5dce782 Content-Type: text/plain; charset="UTF-8" On Mon, Jun 28, 2021 at 6:24 PM Rick Macklem wrote: > The Linux NFS client now has a mount option "nconnect", > which specifies that multiple TCP connections be created > for an NFS mount, where RPCs are done on the connections, > in a round robin fashion. (Alternating between the two TCP > connections for the case of nconnect=2.) > > The Linux man page says: > nconnect=n > When using a connection oriented protocol such as TCP, it > may sometimes be advantageous to set up multiple > connections between the client and server. For instance, > if your clients and/or servers are equipped with multiple > network interface cards (NICs), using multiple connections > to spread the load may improve overall performance. In > such cases, the nconnect option allows the user to specify > the number of connections that should be established > between the client and server up to a limit of 16. > > I don't understand how multiple TCP connections to the same > server IP address will distribute the load across multiple network > interfaces? > I thought that lagg would have handled this? > Splitting a single TCP connection across multiple physical interfaces would cause tons of out-of-order packets to be received. The overhead of TCP reassembly overhead then outweighs the benefit of lagg, especially if there are multiple connections anyway. So that's why, unless your physical interfaces are very slow, it's not worth splitting one TCP connection. LACP doesn't even allow it. Something like nconnect might be useful. But only if there are few NFS clients anyway. In my environment there are usually many, so I wouldn't use nconnect. > > I could easily implement this, but I only have low end hardware > to test on, so I doubt that I will see any performance improvement. > > However, I do think that having two TCP connections, where the > RPCs involving large RPC messages (Read/Readdir/Write) are sent > on one TCP connection and the RPCs that use small RPC messages > (Lookup/Access/Getattr,...) are sent on the other one. > --> This would avoid the frequent small RPCs from getting "logjamed" > behind a bunch of large 1Mbyte Read replies, for example. > > So, what do you think? > - Implement "nconnect" with round robin RPC assignment. > or > - Implement two TCP connections where large RPCs are done > on one and small RPCs on the other. > or > ??? > > I will note I see downsides to doing multiple TCP connections/mount. > 1 - Uses up more IP port#s. > 2 - When an NFS server gets overloaded, it will stop receiving RPC > requests. > This will eventually apply backpressure through TCP to the client to > slow > down RPC requests. Having multiple TCP connections would reduce this > backpressure effect. > --> To be honest, I suspect the slowdown in RPC replies caused by an > overloaded server, is more effective feedback to the NFS client > than TCP backpressure, but I am not sure. > > Comments? rick > > > > --000000000000ba2a2805c5dce782--