From nobody Mon Jul 31 02:28:37 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RDhxk40JYz4ph0W; Mon, 31 Jul 2023 02:28:41 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-oo1-xc30.google.com (mail-oo1-xc30.google.com [IPv6:2607:f8b0:4864:20::c30]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RDhxg4jTWz4T81; Mon, 31 Jul 2023 02:28:39 +0000 (UTC) (envelope-from mjguzik@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=nW93qiMl; spf=pass (mx1.freebsd.org: domain of mjguzik@gmail.com designates 2607:f8b0:4864:20::c30 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oo1-xc30.google.com with SMTP id 006d021491bc7-56c74961e0cso1654263eaf.3; Sun, 30 Jul 2023 19:28:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1690770518; x=1691375318; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=IZpMzCBvink2StQ0JZV+aJeR40m+hbM6FIIegopVNKA=; b=nW93qiMlOwnG+5C47F826GjsZvzeXGvXNV8uQeKqYgsPXWnCc2VoPfNUDSuXNX0LEX UNiRUfN9Ke52cCgHFbr+X2h3ke21o1X3ipfLHupZhnPK6Zw3anLaiiJS9M3nicilYf/h 9xWccuNci/8Y7n6DgPoGzjc9TKVL/SFIptV1eYIyaAj0fGz/rLGqCbN9zwF+dCU0GF4o 1K3ucmxyL5l3zQuBWw9YvQxbWMAxGfzdQ74qjjKnrue+FxeADZxWwiG2Hq9FztBNR8IS eGtULVoishkAhaxaROib9QncDXSsY5NwN4xwydEzX0vhQbgQ0jesuGCQje1b+b2enilm 0Y0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690770518; x=1691375318; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IZpMzCBvink2StQ0JZV+aJeR40m+hbM6FIIegopVNKA=; b=ghmeUZVLzdU6w7WclsjRstmUt/CwIiUHPXdW2URlda537ivlN7vMf7BoIkkz9BUfe8 rCvsUOCjyxnAhqBcz/x0EZiv0/wUu+ClO11hl0CXGCbzUt1e0Cf1F55e55f+3GEpy+CQ XApKkD1YtleIssje+QE653IvPeQ7kBlpwNj/ETpy5a3jBywODCSdFSMpUYxUfBf2m7ED qFmsG/5aNKXIz/tfxqDszdjmKZxVNXed1QkyC7zCTK2V7QI+vR6W4YpeVE1HZC0U68OF IgVxSU5EeO8ttSvsY/AYrhEgO6V/+4fSIC+5f5jh8eo0ZAl3N/e0/wqaNkwR8j3V2Tr/ EO3g== X-Gm-Message-State: ABy/qLbh/CChcCADU7kkr6/ikHa5bBWOCRSsjQHsebLP+CTmHLzFw8i6 GDjirA/uEaC8qaEiHcK8LQX1v9efjeBBCQR6g0Q= X-Google-Smtp-Source: APBJJlFVVowMWDWHGrK0yPcFWehuLGOILmtv3tQgsWGcfhzvvGHdLAYze4iWfJqPcdVzNgZYPmGF0p9gIC7MC1AO80M= X-Received: by 2002:a4a:8299:0:b0:565:cf26:5a10 with SMTP id e25-20020a4a8299000000b00565cf265a10mr6350350oog.0.1690770518199; Sun, 30 Jul 2023 19:28:38 -0700 (PDT) List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Received: by 2002:ac9:614e:0:b0:4f0:1250:dd51 with HTTP; Sun, 30 Jul 2023 19:28:37 -0700 (PDT) In-Reply-To: References: <202307242203.36OM3IwQ009522@gitrepo.freebsd.org> From: Mateusz Guzik Date: Mon, 31 Jul 2023 04:28:37 +0200 Message-ID: Subject: Re: git: 5b353925ff61 - main - vnode read(2)/write(2): acquire rangelock regardless of do_vn_io_fault() To: Konstantin Belousov Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org Content-Type: text/plain; charset="UTF-8" X-Spamd-Result: default: False [-3.45 / 15.00]; NEURAL_HAM_MEDIUM(-0.94)[-0.940]; NEURAL_HAM_SHORT(-0.93)[-0.927]; NEURAL_HAM_LONG(-0.58)[-0.583]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36:c]; MIME_GOOD(-0.10)[text/plain]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::c30:from]; MLMMJ_DEST(0.00)[dev-commits-src-all@freebsd.org,dev-commits-src-main@freebsd.org]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; MIME_TRACE(0.00)[0:+]; FREEMAIL_TO(0.00)[gmail.com]; FREEMAIL_ENVFROM(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROMTLD(0.00)[]; RCVD_TLS_LAST(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; FROM_HAS_DN(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; FREEMAIL_FROM(0.00)[gmail.com]; RCVD_COUNT_THREE(0.00)[3]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim] X-Rspamd-Queue-Id: 4RDhxg4jTWz4T81 X-Spamd-Bar: --- On 7/28/23, Konstantin Belousov wrote: > On Fri, Jul 28, 2023 at 02:17:51AM +0200, Mateusz Guzik wrote: >> On 7/25/23, Konstantin Belousov wrote: >> > The branch main has been updated by kib: >> > >> > URL: >> > https://cgit.FreeBSD.org/src/commit/?id=5b353925ff61b9ddb97bb453ba75278b578ed7d9 >> > >> > commit 5b353925ff61b9ddb97bb453ba75278b578ed7d9 >> > Author: Konstantin Belousov >> > AuthorDate: 2023-07-23 15:55:50 +0000 >> > Commit: Konstantin Belousov >> > CommitDate: 2023-07-24 22:02:59 +0000 >> > >> > vnode read(2)/write(2): acquire rangelock regardless of >> > do_vn_io_fault() >> > >> > To ensure atomicity of reads against parallel writes and truncates, >> > vnode lock was not enough at least since introduction of >> > vn_io_fault(). >> > That code only take rangelock when it was possible that vn_read() >> > and >> > vn_write() could drop the vnode lock. >> > >> > At least since the introduction of VOP_READ_PGCACHE() which >> > generally >> > does not lock the vnode at all, rangelocks become required even >> > for filesystems that do not need vn_io_fault() workaround. For >> > instance, tmpfs. >> > >> >> Is there a bug with pgcache reads disabled (as in when the vnode lock >> is held for reads?) >> >> Note this patch adds 2 lock trips which were previously not present, >> which has to slow things down single-threaded, but I did not bother >> measuring that part. >> >> As this adds to vnode-wide *lock* acquires this has to very negatively >> affect scalability. >> >> This time around I ran: ./readseek3_processes -t 10 (10 workers >> reading from *disjoint* offsets from the same vnode. this in principle >> can scale perfectly) >> >> I observed a 90% drop in performance: >> before: total:25723459 ops/s >> after: total:2455794 ops/s >> >> Going to an unpatched kernel and disabling pgcache reads instead: >> disabled: total:6522480 ops/s >> >> or about 2.6x of performance of the current kernel >> >> In other words I think the thing to do here is to revert the patch and >> instead flip pgcache reads to off by default until a better fix can be >> implemented. > > The rangelock purpose is to ensure atomicity of reads in presence of > writes. In other words, taking the rangelock there is architecturally > right. Also, it fixes issues with truncation that are not fixable with > the vnode lock on tmpfs vnodes anyway. > How come? I see vn_truncate xlocking the vnode across the entire thing. If all tmpfs reads slock and all writes xlock the vnode, everyone gets protection against truncation as is. > That said, disabling pgcache vop on tmpfs means that the regular read vop > is always used, which takes the vnode lock around reads. So I doubt that > the changed disposition would gain much in your test. > I pasted a benchmark number without pgcache and without the change, clearly showing that rangelock + pgcache loses big time to mere vnode locking. > The proper future fix would be to improve scalability of the rangelocks, > whose naive stop-gap implementation I did initially in time of current-7 > or -8 was not changed at all. > I agree rangelocks should be fixed. But in the meantime, the patch at hand seems to be too heavy handed to fix the issue (and disabling pgcache reads would do it just fine instead). > BTW, it seems that file offset locks are no longer needed, but I need to > recheck it. This should shave off four atomics on read and write path. > -- Mateusz Guzik