From nobody Thu Aug 17 19:37:09 2023 X-Original-To: current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RRZz71kNKz4q3lT for ; Thu, 17 Aug 2023 19:37:39 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RRZz56nCjz3cFX; Thu, 17 Aug 2023 19:37:37 +0000 (UTC) (envelope-from mavbsd@gmail.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20221208 header.b=Vpe6s2+a; spf=pass (mx1.freebsd.org: domain of mavbsd@gmail.com designates 2607:f8b0:4864:20::22b as permitted sender) smtp.mailfrom=mavbsd@gmail.com; dmarc=none Received: by mail-oi1-x22b.google.com with SMTP id 5614622812f47-3a82db15081so92995b6e.0; Thu, 17 Aug 2023 12:37:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692301056; x=1692905856; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=i3SZLJjPaeNIYyJ7S9T84pHP4E+l/AOnCeWTCdfZ0r4=; b=Vpe6s2+ag0OLmv4reTbxHJFZnN4/6GD4SuKRmfYpvaiwXW1zlQtVBR+u4ufgFwE7Qh /tIX8pHPP1J79sW0B7ke0pX3rkE+lgT3u9CzEeWmJwnrYLhRXo+i8MQlta3CLne3zUXU sVlCl3QLBe5x73Tv9IZrCQ6VDDXVyLSMEn7RY3QRYhh7WnlUisyEi2Xk87DdvnUEwSWQ TMPZB+stY7WqdrwkLFXQGQM2k2f4s3dgO5TnFa1u1VKCzbNkzP29fbvY3ishnUmXg4y4 NcXglqdMl9PkA4UcCFkw/TzdnvuKyNNB1lQXQbFCazVJ0HJk488En9qERNO0dBND4I7w sjbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692301056; x=1692905856; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=i3SZLJjPaeNIYyJ7S9T84pHP4E+l/AOnCeWTCdfZ0r4=; b=Ubcp0WwZksTGIf1eyjv30J3rMnbBP12iugmx0gh1ZOJyyMcIDW/yjjTPPCHM+50bpu Las839UpklnIXa4bt5Vz5cuk7cYyALMx8CrxxkKXVcxyq2/fVscfb7gtTVSpVEp2t7Jy Qm0trZ31RDVxi0JDC6u5Iw4sXJrCqW4mJp7FOblU/zfPtIGMyhRIUNkqOzFM5nRvZYj6 LL/SfGOVULv0MrmTKXb14CUI6kZDXz7mIpTSMTwCTko9bbOr4xengjO6ZDek5700h//K NOaMEpPbFs6SbyRKynY0cmRsRkDg/QpLRz56mc5Y3eHr2XbTUjViDBf1Z5xroUPFQ6F/ CcDQ== X-Gm-Message-State: AOJu0YyCrW9KbpoAAUNx3AW7PvTfxDrtGsHqox+8kzjz7xwxRCY+I1Gm apRopJ1bCyucIewhsPlIhKKeBMEeXHY= X-Google-Smtp-Source: AGHT+IEv8FJKrtXG40Erg/W5xpzlqpjHjMgO6W+ebE2JgQG1zCYfmc8lOYDw6ljm3fKGxD3s9Jc5ww== X-Received: by 2002:a05:6358:6f16:b0:135:499b:a68c with SMTP id r22-20020a0563586f1600b00135499ba68cmr394666rwn.8.1692301055690; Thu, 17 Aug 2023 12:37:35 -0700 (PDT) Received: from [10.230.45.5] ([38.32.73.2]) by smtp.gmail.com with ESMTPSA id z1-20020a25ad81000000b00c5fc63686f1sm41889ybi.16.2023.08.17.12.37.34 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 17 Aug 2023 12:37:35 -0700 (PDT) Message-ID: <8c88acdc-7009-9801-ef44-3e1359c59aff@FreeBSD.org> Date: Thu, 17 Aug 2023 15:37:09 -0400 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: ZFS deadlock in 14 Content-Language: en-US From: Alexander Motin To: =?UTF-8?Q?Dag-Erling_Sm=c3=b8rgrav?= Cc: current@freebsd.org, Mateusz Guzik , Martin Matuska References: <86leeltqcb.fsf@ltc.des.no> <86h6p4s64h.fsf@ltc.des.no> <86a5utrafp.fsf@ltc.des.no> <86350kqokl.fsf@ltc.des.no> <86y1icp95t.fsf@ltc.des.no> <86ttt0p8wv.fsf@ltc.des.no> <197ead1e-210a-6be6-7e24-5c56b14bb777@FreeBSD.org> In-Reply-To: <197ead1e-210a-6be6-7e24-5c56b14bb777@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spamd-Result: default: False [-3.16 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-0.96)[-0.964]; FORGED_SENDER(0.30)[mav@FreeBSD.org,mavbsd@gmail.com]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20221208]; R_SPF_ALLOW(-0.20)[+ip6:2607:f8b0:4000::/36]; MIME_GOOD(-0.10)[text/plain]; RCPT_COUNT_THREE(0.00)[4]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2607:f8b0:4864:20::22b:from]; DMARC_NA(0.00)[freebsd.org]; MLMMJ_DEST(0.00)[current@freebsd.org]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_NEQ_ENVFROM(0.00)[mav@FreeBSD.org,mavbsd@gmail.com]; FREEMAIL_CC(0.00)[freebsd.org,gmail.com,FreeBSD.org]; RCVD_TLS_LAST(0.00)[]; DKIM_TRACE(0.00)[gmail.com:+]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US]; FREEMAIL_ENVFROM(0.00)[gmail.com]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; MID_RHS_MATCH_FROM(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-Spamd-Bar: --- X-Rspamd-Queue-Id: 4RRZz56nCjz3cFX On 17.08.2023 14:57, Alexander Motin wrote: > On 15.08.2023 12:28, Dag-Erling Smørgrav wrote: >> Mateusz Guzik writes: >>> Going through the list may or may not reveal other threads doing >>> something in the area and it very well may be they are deadlocked, >>> which then results in other processes hanging on them. >>> >>> Just like in your case the process reported as hung is a random victim >>> and whatever the real culprit is deeper. >> >> We already know the real culprit, see upthread. > > Dag, I looked through the thread once more, and, while thank you for > tracing it, but you never went beyond txg_wait_synced() in `zfs revert` > thread.  If you are saying that thread is holding the lock, then the > question is why transaction commit is stuck.  I need to see stacks for > ZFS sync threads, or better all kernel stacks, just in case.  Without > that information I can only speculate. > > Trying to run your test (so far without reproduction) I see it producing > a substantial amount of ZIL writes.  The range of commits you reduced > the scope to so far includes my ZIL locking refactoring, where I know > for sure are some deadlocks.  I am already waiting for 3 weeks now for > reviews and tests for PR that should fix it: > https://github.com/openzfs/zfs/pull/15122 .  It would be good if you > could test it, though it seems to depend on few more earlier patches not > merged to FreeBSD yet. Ah, appears on the pool I tested first I have sync=always from earlier tests, that explains the high amount of ZIL traffic I saw, so it may be irrelevant. But I still wonder what sync threads are doing in your case. -- Alexander Motin