From nobody Mon Mar 21 14:07:48 2022
X-Original-To: freebsd-xen@mlmmj.nyi.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
	by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 8D31F1A26952
	for <freebsd-xen@mlmmj.nyi.freebsd.org>; Mon, 21 Mar 2022 14:08:09 +0000 (UTC)
	(envelope-from zedupsys@gmail.com)
Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130])
	(using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK))
	by mx1.freebsd.org (Postfix) with ESMTPS id 4KMc085llzz4g5S
	for <freebsd-xen@freebsd.org>; Mon, 21 Mar 2022 14:08:08 +0000 (UTC)
	(envelope-from zedupsys@gmail.com)
Received: by mail-lf1-x130.google.com with SMTP id 5so6604751lfp.1
        for <freebsd-xen@freebsd.org>; Mon, 21 Mar 2022 07:08:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112;
        h=reply-to:subject:to:cc:references:from:message-id:date:user-agent
         :mime-version:in-reply-to:content-language:content-transfer-encoding;
        bh=oqnB22vBeUomZwxeahF3u+nOQ9GSkNIAwVYoL83pIbs=;
        b=Fo2rLJ5QfNSRRlz9hu8QcQ2qtYe0LqDV5kOADK9DM174LAdqLhF6JHPDRPsigYxt6X
         f9GhnLkyDPQH2qTMVzYFvqbbzyu6MIkvQnUaklOUVuBO5EqzTKFFeZg9PAwM4pruA9Xa
         YkX3KEtsEGK7p7nGSsqVYPg6l5QE8zxL2MJ96vDdvj+j0xx0xqRoJV1if2nqO6vvAJPy
         hsKi/tZD5F9Wm4KfolYxbegkbNe+K5g8cyCS00YJR5tI/UX7yGrJzWfO/hZgVaet1zt8
         5OfnIfV+nkPS0Dx8V3GBBcFqUvnq4dbzNmUZM4oAANJXi1S3/42L77oJfmI7yoKDry+q
         D1kw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=x-gm-message-state:reply-to:subject:to:cc:references:from
         :message-id:date:user-agent:mime-version:in-reply-to
         :content-language:content-transfer-encoding;
        bh=oqnB22vBeUomZwxeahF3u+nOQ9GSkNIAwVYoL83pIbs=;
        b=1inpmiS67nWmqJlVEOGqPk26X3XkLByetm+aK4DTk4yX9mFIWR4jS8Bm3crcDpKr0z
         IYMdJz5rgyo3KK/ZODOjC+7EDF/50vPR8blE2ve6lR/X5yawIlf2od1LKX3io0U8jHbj
         cMnjlv8lIXC239uhXxT+fC6y3u4RWZQJJlMrSOCxpszwwMFB1Ne7NNhOCXgHSNvm93Wt
         0o/VW2wLTFLXwEVy6sdKSc91IAMU2MhjAoNdGtNtdoI+yV/pXF3uedlC6ORm+W1wDHqH
         768x1c1ygEmAWAflAtfbcp7Bdzlirb/GOGMRmcVZUTIvLUuOJ+WYuuC1uzwCfpERmnw0
         3gtA==
X-Gm-Message-State: AOAM531MK1U2dboSpRHTYmfOt0R1O1W2KRsEp2+N/xzTfM9VcPpTe19M
	73mmjA9YfTN/xOsINkStz7XEgVaxJ08=
X-Google-Smtp-Source: ABdhPJzn2bXTEYVM1v6N0wgu6zBIrb0Q4/iVZEm/kYtCLanz4kmU+Sn6SOa1qDhAISvS2zJ6cBONlA==
X-Received: by 2002:a05:6512:b11:b0:44a:2ead:daf2 with SMTP id w17-20020a0565120b1100b0044a2eaddaf2mr2767724lfu.642.1647871687049;
        Mon, 21 Mar 2022 07:08:07 -0700 (PDT)
Received: from [10.3.0.1] ([213.110.65.3])
        by smtp.googlemail.com with ESMTPSA id s10-20020a19ad4a000000b0044826a25a2esm1811387lfd.292.2022.03.21.07.08.06
        (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
        Mon, 21 Mar 2022 07:08:06 -0700 (PDT)
Reply-To: zedupsys@gmail.com
Subject: Re: ZFS + FreeBSD XEN dom0 panic
To: =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= <roger.pau@citrix.com>
Cc: freebsd-xen@freebsd.org, buhrow@nfbcal.org
References: <YiCa70+HQScsoaKX@Air-de-Roger>
 <3d4691a7-c4b3-1c91-9eaa-7af071561bb6@gmail.com>
 <YihojHNbzJagm4SI@Air-de-Roger>
 <5dfdecd5-f94d-29b4-791e-0adde5405cf5@gmail.com>
 <Yiocagc4dTc15/Y1@Air-de-Roger>
 <feb35237-555b-29dc-d2fd-1659b400d683@gmail.com>
 <Yi8IvnqWUoWBIsLB@Air-de-Roger>
 <ca88a8c8-3b4e-fbde-18a7-d4e5f61e8b2c@gmail.com>
 <YjSkIZmV+t8Q3AEn@Air-de-Roger>
 <b5fc1a74-6a52-e237-4a69-2b3f8d4974bd@gmail.com>
 <YjheCAFXykUZz/pz@Air-de-Roger>
From: Ze Dupsys <zedupsys@gmail.com>
Message-ID: <d1d8f552-15a3-3d4f-1143-2e1cae7904ee@gmail.com>
Date: Mon, 21 Mar 2022 16:07:48 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.12.0
List-Id: Discussion <freebsd-xen.freebsd.org>
List-Archive: https://lists.freebsd.org/archives/freebsd-xen
List-Help: <mailto:xen+help@freebsd.org>
List-Post: <mailto:xen@freebsd.org>
List-Subscribe: <mailto:xen+subscribe@freebsd.org>
List-Unsubscribe: <mailto:xen+unsubscribe@freebsd.org>
Sender: owner-freebsd-xen@freebsd.org
X-BeenThere: freebsd-xen@freebsd.org
MIME-Version: 1.0
In-Reply-To: <YjheCAFXykUZz/pz@Air-de-Roger>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: lv
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: 4KMc085llzz4g5S
X-Spamd-Bar: --
Authentication-Results: mx1.freebsd.org;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=Fo2rLJ5Q;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (mx1.freebsd.org: domain of zedupsys@gmail.com designates 2a00:1450:4864:20::130 as permitted sender) smtp.mailfrom=zedupsys@gmail.com
X-Spamd-Result: default: False [-2.00 / 15.00];
	 HAS_REPLYTO(0.00)[zedupsys@gmail.com];
	 RCVD_VIA_SMTP_AUTH(0.00)[];
	 TO_DN_SOME(0.00)[];
	 R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c];
	 FREEMAIL_FROM(0.00)[gmail.com];
	 REPLYTO_ADDR_EQ_FROM(0.00)[];
	 RCVD_COUNT_THREE(0.00)[3];
	 DKIM_TRACE(0.00)[gmail.com:+];
	 DMARC_POLICY_ALLOW(-0.50)[gmail.com,none];
	 FROM_EQ_ENVFROM(0.00)[];
	 MIME_TRACE(0.00)[0:+];
	 FREEMAIL_ENVFROM(0.00)[gmail.com];
	 ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US];
	 MID_RHS_MATCH_FROM(0.00)[];
	 DWL_DNSWL_NONE(0.00)[gmail.com:dkim];
	 ARC_NA(0.00)[];
	 NEURAL_HAM_MEDIUM(-1.00)[-1.000];
	 R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112];
	 FROM_HAS_DN(0.00)[];
	 RCPT_COUNT_THREE(0.00)[3];
	 NEURAL_SPAM_SHORT(1.00)[1.000];
	 NEURAL_HAM_LONG(-1.00)[-1.000];
	 MIME_GOOD(-0.10)[text/plain];
	 FREEMAIL_REPLYTO(0.00)[gmail.com];
	 PREVIOUSLY_DELIVERED(0.00)[freebsd-xen@freebsd.org];
	 TO_MATCH_ENVRCPT_SOME(0.00)[];
	 RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::130:from];
	 MLMMJ_DEST(0.00)[freebsd-xen];
	 RCVD_TLS_ALL(0.00)[]
X-ThisMailContainsUnwantedMimeParts: N

On 2022.03.21. 13:14, Roger Pau Monné wrote:
> On Fri, Mar 18, 2022 at 08:48:08PM +0200, Ze Dupsys wrote:
>> On 2022.03.18. 17:24, Roger Pau Monné wrote:
>>> On Tue, Mar 15, 2022 at 08:51:57AM +0200, Ze Dupsys wrote:
>>>> On 2022.03.14. 11:19, Roger Pau Monné wrote:
>>>>> On Mon, Mar 14, 2022 at 10:06:58AM +0200, Ze Dupsys wrote:
>>>>>> ..
>>>>>>
>>>>>> Why those lines starting "xnb(xnb_detach:1330):" do not have any message?
>>>>>> Could it be that there is a bad pointer to message buffer that can not be
>>>>>> printed? And then sometimes panic happens because access goes out of allowed
>>>>>> memory region?
>>>>> Some messages in netback are just "\n", likely leftovers from debug.
>>>> Okay, found the lines, it is as you say. So this will not be an easy one.
>>>>
>>>>
>>>>> Can you try to stress the system again but this time with guests not
>>>>> having any network interfaces? (so that netback doesn't get used in
>>>>> dom0).
>>>> I'll try to come up with something. At the moment all commands to VMs are
>>>> given through ssh.
>>>>
>>>>
>>>>> Then if you could rebuild the FreeBSD dom0 kernel with the above patch
>>>>> we might be able to get a bit more of info about blkback shutdown.
>>>> I rebuilt 13.1 STABLE, with commenting out #undef and adding #define, thus
>>>> line number will differ by single line. For this test i did not remove
>>>> network interfaces, and did add DPRINTF messages to xnb_detach function as
>>>> well, since i hoped to maybe catch something there, by printing pointers. I
>>>> somewhat did not like that xnb_detach does not check for NULL return from
>>>> device_get_softc, nor for device_t argument, so i though, maybe those
>>>> crashes are something related to that. But i guess this will not be so easy,
>>>> and maybe it is safe to assume that "device_t dev" is always valid in that
>>>> context.
>>>>
>>>> So i ran stress test, system did not crash as it happens often when more
>>>> debugging info is printed, characteristics change. But it did leak sysctl
>>>> xbbd variables. I'll attach all collected log files. sysctl and xl list
>>>> commands differ in timing a little bit. xl list _02 is when all VMs are
>>>> turned off. Sysctl only has keys without values, not to trigger xnb tests
>>>> while reading all values.
>>>
>>> So I've been staring at this for a while, and I'm not yet sure I
>>> figured out exactly what's going on, but can you give a try to the
>>> patch below?
>>>
>>> Thanks, Roger.
>>> ---8<---
>>> diff --git a/sys/xen/xenbus/xenbusb.c b/sys/xen/xenbus/xenbusb.c
>>> index e026f8203ea1..a8b75f46b9cc 100644
>>> --- a/sys/xen/xenbus/xenbusb.c
>>> +++ b/sys/xen/xenbus/xenbusb.c
>>> @@ -254,7 +254,7 @@ xenbusb_delete_child(device_t dev, device_t child)
>>>    static void
>>>    xenbusb_verify_device(device_t dev, device_t child)
>>>    {
>>> -	if (xs_exists(XST_NIL, xenbus_get_node(child), "") == 0) {
>>> +	if (xs_exists(XST_NIL, xenbus_get_node(child), "state") == 0) {
>>>    		/*
>>>    		 * Device tree has been removed from Xenbus.
>>>    		 * Tear down the device.
>>>
>>
>> Hi, Roger,
>>
>> Yes, i will run it, but results will likely be on Monday.
>>
>> I have done some more digging, but since i do not know how things work
>> internally, i have a lot of unknowns and assumptions. I did a lot of DPRINTF
>> stuff in various places, i will attach logs, maybe they give better picture.
>>
>> For easier log reading i filter single log pair with commands:
>> sed -i '' -e 's/0xfffff80/0x/g' *
>> cat *xen_serial_* | grep -E 'xbb|talkv|xs_read|xs_gather|xenbusb_add_device'
>>> xbb.log
>> cat *sysctl* | grep -E '^dev.xbb' | grep -Ev
>> "\.%location:|\.%pnpinfo:|\.%parent" > xbb_sysctl.log
>>
>> Then i search for leaked sysctl *.xenstore_path and search that in xbb.log.
>> As far as i can tell, cases when it is leaked xbb's have no
>> xbb_close_backend call, for whatever reason. Often pointers for leaked xbb
>> are a reused ones from previously closed xbb instnaces.
>>
>>
>> My suspicion goes for blkback.c, xbb_attach_disk lines:
>> ..
>>    error = xs_gather(XST_NIL, xenbus_get_node(dev), "physical-device-path",
>>        NULL, &xbb->dev_name, NULL);
>>    if (error != 0)
>>      return;
>> ..
>>
>> The thing is that for those lost xbb's, this is where error = 2, which is
>> ENOENT (if i found correct code).
>>
>> Which is weird, because i assume that xbb_attach_disk function is called
>> after "hot-plug script" has finished execution. This is defined in
>> xbb_attach(device_t dev):
>> ..
>>    KASSERT(!xbb->hotplug_done, ("Hotplug scripts already executed"));
>>    watch_path = xs_join(xenbus_get_node(xbb->dev), "physical-device-path");
>>    xbb->hotplug_watch.callback_data = (uintptr_t)dev;
>>    xbb->hotplug_watch.callback = xbb_attach_disk;
>> ..
>>
>> So i added debugging messages at file: /usr/local/etc/xen/scripts/block
>> after line "xenstore-write $path/physical-device-path $params", but did not
>> see anything suspicious.
>>
>> One of my ideas is that, maybe for some reason callback is called too ealry,
>> because i would expect that when the callback is called, the path
>> $path/physical-device-path should exist (is created by script). But then how
>> can xs_gather fail with ENOENT there? Something manages to remove
>> $path/physical-device-path in the short window between xenstore-write and
>> while callback activates?
>>
>> Another thing is that function xbb_attach_disk on error for "error =
>> xs_gather" just returns if != 0. I would expect it to be able to handle
>> error gracefully it should flag xbb with an error and then shut down or
>> something, but since i do not know internals, it is hard to judge correct
>> behaviour in that context.
>>
>> Another interesting behaviour is that xbb_shutdown is called twice for each
>> instance. Once from some function and then from xbb_detach.
> 
> This is expected. On proper domain shutdown the backend will be closed
> when the frontend switches to Closing state. Hence the cal from
> xbb_detach is for cases where the backend hasn't been closed properly
> (ie: domain destroy or stuck frontend).
> 
>>
>> My problem of geting more useful info is that i do not understand the flow
>> of VM creation, including disk allocation.
>>
>> For my understanding they just "magically" appear detected with xbb_probe.
>> :)
>>
>> I tried to understand function xs_gather called by xbb_attach, implemented
>> by dev/xen/xenstore/xenstore.c, but it just writes packet somewhere; i
>> assumed that dev/xen/xenstore/xenstore_dev.c is the one that handles other
>> end for that packet, so i looked at "case XS_READ:" in that file, but it
>> calls xs_dev_request_and_reply which is implemented by
>> dev/xen/xenstore/xenstore.c, so for me it feels like going in circles. So at
>> the moment i did not find which component could be the one that sends
>> ENOENT.
>>
>> Could you tell the big picture how VM block dev is created? I mean is it xen
>> tools writing request in xenstore_dev, but how it then activates xbb probe?
>> Which component is responsible for allocating memory xbb instance?
> 
> PV devices are all handled in xenstore (a.k.a. xenbus) which is a kind
> of key-value database that contains the information about devices.
> 
> The FreeBSD kernel sets a watch on certain xenstore paths, so that
> when new entries are added a specific function is called on order to
> detect whether devices are being added or removed.
> 
>>
>> I added printf in dev/xen/xenstore/xenstore.c
>> ..
>> xs_talkv(struct xs_transaction t, enum xsd_sockmsg_type request_type,
>>      const struct iovec *iovec, u_int num_vecs, u_int *len, void **result)
>> {
>> ..
>>    if (msg.type == XS_ERROR) {
>>      error = xs_get_error(ret);
>>      // .. printf here ..
>>      free(ret, M_XENSTORE);
>> ..
>> And it prints error 22 a lot (EINVAL, if like in errno.h) and sometimes
>> error 2 (this is the one i suppose i saw at mentioned xs_gather call). The
>> trouble is that i can not locate the source for xs_talkv error generation +
>> when all error messages are printed on serial, it starts to influence
>> overall system behaviour, since serial can not keep up with message
>> generation speed and as a result sysctl leak does not trigger as fast.
>>
>> Could i create static variable in xenstore.c and set it to 1 from blkback.c
>> when i want logging and to 0 when i don't? Or is it a global component and
>> when blkback.c would set's it's logging to 1, all other concurrent messages
>> will be logged?
>>
>> I would not expect xs_talkv to encounter as many error 22 cases, as it does.
>> Could this be relevant? Or could it be that it's just by design that some
>> functions test if key exists before creating new one?
>>
>> I stared at xs_talkv function for a while. It calls xs_write_store twice, so
>> i thought maybe there could be a case when message is not writen
>> "atomically"? Thus error 22. Yes, it does sx_xlock(&xs.request_mutex), but
>> function xs_write_store does mtx_lock(&xs.ring_lock); and
>> mtx_unlock(&xs.ring_lock); in an interesting manner. OTOH if this really was
>> even a bit wrong, then system would be far more unstable maybe even
>> unusable, right?
>>
>> As for now, it seems that actually the third process that was creating zfs
>> volumes is not important. What is important is load on system, not load on
>> ZVOLs, because i was running VM1 and VM2 on/off stress without zfs create/dd
>> script, but was building kernel on parallel, sysctl variables were leaked.
>> Kernel was building on UFS ada0, VM disks on ZVOLs ada1.
>>
>> For today i guess this is it.
>>
>> Thanks for looking into this and sorry i can not produce more useful error
>> logs. Earlier dated has no error code prints. Sysctl variables are taken
>> when all VMs are down.
> 
> I think the problem is not likely with the xenstore implementation
> (ie: xs_talkv) but rather a race with how the FreeBSD kernel detects
> and manages addition and removal of devices that hang off xenbus.
> 
> I'm afraid there's too much data below for me to parse it.

Understood. Sounds more tricky than i thought. What could i do to make 
data more useful?


> Let me know whether the proposed patch made any difference.

It did not make any difference, system still panics and sysctl leaks.


Thanks.