From nobody Fri Sep 01 16:22:40 2023 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RcjxG1TLbz4rhBQ; Fri, 1 Sep 2023 16:22:42 +0000 (UTC) (envelope-from kevans@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RcjxG0tvrz3QSN; Fri, 1 Sep 2023 16:22:42 +0000 (UTC) (envelope-from kevans@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1693585362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1hMQi5qAQo4xqlkhNL/eioF+ue4d23L54ph6Qj3F5P4=; b=EUYPobzfQwHiiHkBEM5jlZXsGVC1SvcsK34dCKrQbSMgXDvCQrDwN1XArXcQubjbyKOcoB QI+QRd2wWqrroTQwkYN19ODQWOQwNste7uqMncO7KJgVmuuz0syY7ENBerI+fmKTNCFXiS QwDSvgxGWyuRnwwzVdWL1hkqvmc3VOnIxFoVvnRQo9QqSROSPIq7l3zRpMAm9wTJgMBcQ4 kr1U2w7INEFkcSIA0FcXmUdO+5DNdl4c1l/6cDjm+qiAtyOTWuYATDrNlA+BvqBDossRdd vrU/ctANKo1xeZ3GVBgS7YQV1yaYyPfOMvm6tjOS3baLcYj/q8bJUophxWRsVQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1693585362; a=rsa-sha256; cv=none; b=NW+OJeAOEsThKWhCO+Pl+VqByM2VJTtpT7kFgsOMNMz3uOwCAW514Wj0tUKZCZYI4rwUTN yV7HcwuLsVNfHB1UWAjtVrK7wsAQTbs8d23q2P6Nax3sMFM3OHf0QB+a0bT18GdL0e9jVv EMsw4Ty2SXl7ViDdqlarJ+/yDy2u4MaxAAmev3X3mjDetC/4hDtoZrKFdwxHNywmwkfjCO ZdUX5IGULfB7F/1XxTzi/yqODLwQg49V8mjPjDVpCpTqVEfncfuaGRJoRnZe8hlGVKPpL4 sdmDhyj/ncdzjm6LL7NtUTAX+rd3La1q+JjKqkWWO3d2XPBD0/GompQS2CVRow== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1693585362; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1hMQi5qAQo4xqlkhNL/eioF+ue4d23L54ph6Qj3F5P4=; b=gFzVS2wCEi1Y4tR0Rf6eIpp2BWcrWv6PwG2u+ia3dWyCOxtrU3eCl0urtUi3rq+OlNTwc6 BUJR81y9K0Z8XaTugXuOpH0JhWUHGMKaC+MaY1yhpJSoTdhUWJiGyIlAQzEPGoflPP/Pml lEiGQN758K8cjM7d0g7VKlEchv97TAGGBIUJYhEJR+4bYzyL3rE+M7Bp00iInW9E2vNEqy pjKz5H2sgqy/LvaQERfjzQQL5QDamVJZGF/L0TfehEkaIsh7lJbv3gQs8YAs7aSaGSb4yQ 9XAPYxHyMAgllvxL2PITGzDPIhJ0tdjQ2eDYjoUvDF6JsRvoENQ6WQoZxfuorA== Received: from [10.9.4.95] (unknown [209.182.120.176]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) (Authenticated sender: kevans/mail) by smtp.freebsd.org (Postfix) with ESMTPSA id 4RcjxF3LkFz1RBW; Fri, 1 Sep 2023 16:22:41 +0000 (UTC) (envelope-from kevans@FreeBSD.org) Message-ID: <5e08b56e-ca81-75a8-dcdc-c9a7dcb3bcad@FreeBSD.org> Date: Fri, 1 Sep 2023 11:22:40 -0500 List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-main@freebsd.org X-BeenThere: dev-commits-src-main@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: git: 315ee00fa961 - main - zfs: merge openzfs/zfs@804414aad Content-Language: en-US To: Alexander Motin , Martin Matuska Cc: src-committers@freebsd.org, dev-commits-src-all@freebsd.org, dev-commits-src-main@freebsd.org, Cy Schubert References: <202308270509.37R596B5048298@gitrepo.freebsd.org> <65269e7a-4c3f-95ff-3e81-91b76e023fbd@FreeBSD.org> <7b12cc47-0e41-ee8c-2165-9e81874c3490@FreeBSD.org> <80777717-1d67-104a-94f6-2ac8112e41b8@FreeBSD.org> From: Kyle Evans In-Reply-To: <80777717-1d67-104a-94f6-2ac8112e41b8@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 9/1/23 11:04, Alexander Motin wrote: > On 01.09.2023 11:46, Kyle Evans wrote: >> On 9/1/23 08:41, Alexander Motin wrote: >>> On 31.08.2023 22:18, Kyle Evans wrote: >>>> It seems to have clearly been stomped on by uma trashing. >>>> Encountered while running a pkgbase build, I think while it was in >>>> the packaging phase. I note in particular in that frame: >>>> >>>> (kgdb) p/x lwb->lwb_issued_timestamp >>>> $4 = 0xdeadc0dedeadc0de >>>> >>>> So I guess it was freed sometime during one of the previous two >>>> zio_nowait() calls. >>> >>> Thank you, Kyle.  If the source lines are resolved correctly and it >>> really crashes on lwb_child_zio access, then I do see there a >>> possible race condition, even though I think it would involve at >>> least 2 or may be even 3 different threads. >>> >> >> Oh, sorry- yes, it was the access to lwb_child_zio there. >> >> >>> I've just created this new PR to address it: >>> https://github.com/openzfs/zfs/pull/15233 >>> >>> If you'll be able to test it, include also the two previous: >>> https://github.com/openzfs/zfs/pull/15227 >>> https://github.com/openzfs/zfs/pull/15228 >>> >>> Thank you for something actionable, it really feels much better! :) >>> >> >> Perfect, thanks! I haven't been able to reproduce it since the first >> time, but your explanation sounds plausible to me. >> >> I'm not a ZFS developer, but it's not clear to me how I didn't end up >> tripping over other assertions, though; e.g., in >> zil_lwb_flush_vdevs_done: >> >> 1442         ASSERT3S(lwb->lwb_state, ==, LWB_STATE_WRITE_DONE); >> 1443         lwb->lwb_state = LWB_STATE_FLUSH_DONE; >> >> lwb_state seems to only be set to LWB_STATE_WRITE_DONE in >> zil_lwb_write_done (lwb_write_zio's completion routine). I would've >> thought all three of these were executed synchronously in >> __zio_execute(), which would presumably put us in LWB_STATE_ISSUED at >> the time of completing the lwb_root_zio? > > That is where ZIO dependencies work.  lwb_root_zio can never complete > before lwb_write_zio completion.   So first zil_lwb_write_done() on > lwb_write_zio completion should move the lwb to LWB_STATE_WRITE_DONE, > then zil_lwb_flush_vdevs_done() on lwb_root_zio completion should move > it to LWB_STATE_FLUSH_DONE, at which state zil_sync() can free it.  If > only at that point we try to check lwb->lwb_child_zio, we see the > 0xdeadc0dedeadc0de and try call zio_nowait() on it with the result you > saw.  Would lwb_child_zio actually be used by the specific lwb, > lwb_write_zio could not proceed before its completion first and so late > zio_nowait() call for it would be legal, but not otherwise. > A-ha, ok, that makes sense- thanks for taking the time to explain that! Thanks, Kyle Evans