From nobody Thu Mar 10 09:34:26 2022 X-Original-To: freebsd-xen@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id EE17F1A0E7CA for ; Thu, 10 Mar 2022 09:34:15 +0000 (UTC) (envelope-from zedupsys@gmail.com) Received: from mail-lf1-x12e.google.com (mail-lf1-x12e.google.com [IPv6:2a00:1450:4864:20::12e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4KDkRC0pLfz4hVL for ; Thu, 10 Mar 2022 09:34:15 +0000 (UTC) (envelope-from zedupsys@gmail.com) Received: by mail-lf1-x12e.google.com with SMTP id w7so8380333lfd.6 for ; Thu, 10 Mar 2022 01:34:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:to:cc:references:reply-to:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=vq54/vvWTG/u+PtfsK6p7MVL72jBcAARGyw4v4UxQ6U=; b=b/oGMurFplbapkJ+tFr0YnPKKvtes1nYN6ptRSQDWSICl2F8YWJK2Q76roiZFO2ndj LcWkrEs1WxnfmBhbD8cvqSfBZcbdo3jjXIh/Et3MXiQGQ+H/QkEV2JYqwzPG+kpehJZg DY3Tgs1UTef7OtrUdsoTDqZWtJHvFh2tKtG4Sig4IsZA6VSRtnruzHOC6bcFuAPiykzD Wkq92/cUU+khTXPwRc7EMDES2VugHsIphYZ1FKa5ADTA4WkCTVa8HF1AQjxiqOsssDWz lXqIxGiEzS9YpsV7mnZjiP7YPj6/L8v9x2buzM/KKU6H6dKO/OG6PGeDp48GuQXRYsx+ Jo7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:reply-to:from :message-id:date:user-agent:mime-version:in-reply-to :content-transfer-encoding:content-language; bh=vq54/vvWTG/u+PtfsK6p7MVL72jBcAARGyw4v4UxQ6U=; b=4aLvN0U+u4uUtNzjg9FVJm4srIZz6LbiPMnwNARB0xbsmP4SXpsGzhAhcwH5MgGz43 uX8YDUEYdXJxqW4weYUMFeZBoDN6BR779y+O4B0sp2pRxfHmrTnm76eBFnvsnyS8StVl TnzrcL+5S4jpvgDnzlSAnXFYPOYd4bXx3IxnNbdzayfBbeBEboWNnl6l4Emq8nO6qbSp F+q64QIxXc7+DQ23fsKo4HNMjCN0L5fkStt2gXaVc1F4sbua+xvzsWa4jZP0KqncOtvt vHn8gGXG/jwiDtgw54lnwT2nxQiPqsStheMe4sICLBtNYWKzWVLW1iMYNlvEvcQyJTp9 oRSQ== X-Gm-Message-State: AOAM5319YiR9ezexhWNgZ+JW12qQHt7saz8gP2eHbqXJGeYjI838CjV9 mxeIdsvx5BX0qVqUWYhpHLY= X-Google-Smtp-Source: ABdhPJy4flu1r+4ZJ0NyW3Y9StHfNRl2fv+TLzANg/biohKMR/4r0MSiBmvRe7vXdsBrwrPnhxrhEA== X-Received: by 2002:a05:6512:2341:b0:448:2465:7cf with SMTP id p1-20020a056512234100b00448246507cfmr2333673lfu.474.1646904853442; Thu, 10 Mar 2022 01:34:13 -0800 (PST) Received: from [10.3.0.1] ([213.110.65.3]) by smtp.googlemail.com with ESMTPSA id m11-20020a05651202eb00b00443130ce80asm873630lfq.138.2022.03.10.01.34.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 10 Mar 2022 01:34:12 -0800 (PST) Subject: Re: ZFS + FreeBSD XEN dom0 panic To: =?UTF-8?Q?Roger_Pau_Monn=c3=a9?= Cc: freebsd-xen@freebsd.org, buhrow@nfbcal.org References: <202203011540.221FeR4f028103@nfbcal.org> <3d4691a7-c4b3-1c91-9eaa-7af071561bb6@gmail.com> Reply-To: zedupsys@gmail.com From: Ze Dupsys Message-ID: <061700f9-86aa-b2ae-07a4-27d4c3d96966@gmail.com> Date: Thu, 10 Mar 2022 11:34:26 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-xen List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-xen@freebsd.org X-BeenThere: freebsd-xen@freebsd.org MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Rspamd-Queue-Id: 4KDkRC0pLfz4hVL X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=pass header.d=gmail.com header.s=20210112 header.b="b/oGMurF"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (mx1.freebsd.org: domain of zedupsys@gmail.com designates 2a00:1450:4864:20::12e as permitted sender) smtp.mailfrom=zedupsys@gmail.com X-Spamd-Result: default: False [-2.28 / 15.00]; HAS_REPLYTO(0.00)[zedupsys@gmail.com]; RCVD_VIA_SMTP_AUTH(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2a00:1450:4000::/36:c]; FREEMAIL_FROM(0.00)[gmail.com]; REPLYTO_ADDR_EQ_FROM(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_TRACE(0.00)[gmail.com:+]; DMARC_POLICY_ALLOW(-0.50)[gmail.com,none]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US]; MID_RHS_MATCH_FROM(0.00)[]; DWL_DNSWL_NONE(0.00)[gmail.com:dkim]; ARC_NA(0.00)[]; R_DKIM_ALLOW(-0.20)[gmail.com:s=20210112]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; NEURAL_HAM_LONG(-0.97)[-0.973]; MIME_GOOD(-0.10)[text/plain]; FREEMAIL_REPLYTO(0.00)[gmail.com]; PREVIOUSLY_DELIVERED(0.00)[freebsd-xen@freebsd.org]; NEURAL_SPAM_MEDIUM(0.69)[0.693]; TO_MATCH_ENVRCPT_SOME(0.00)[]; RCVD_IN_DNSWL_NONE(0.00)[2a00:1450:4864:20::12e:from]; MLMMJ_DEST(0.00)[freebsd-xen]; RCVD_TLS_ALL(0.00)[] X-ThisMailContainsUnwantedMimeParts: N On 2022.03.09. 10:42, Roger Pau Monné wrote: > On Sun, Mar 06, 2022 at 02:41:17PM +0200, Ze Dupsys wrote: >> If i execute steps 1 and 4, it seems that machine does not crash; tested 8 >> hours Dom0 8GB RAM, 4 hours Dom0 2GB RAM. So creating a lot of ZVOLs and >> monitoring it does not seem to be the cause. >> >> Yesterday was a happy and a sad day at the same time. The happy thing is >> that i think that i have understood one panic reason and somewhat better >> located where  one problem might be, but at the same time, i think i have >> located 2 bugs that cause one specific panic, but 2 other panic reasons are >> due to different circumstances. >> >> Further is my pure speculation about two things, somewhat related; i do not >> know XEN or kernel internals, thus take this with grain of salt. >> >> Knowing that this seemed to be RAM problem and thinking how ZFS sysctl >> tuning values differ, i started to look at "sysctl -a" diffs for Dom0 with >> 2GB RAM and 8GB RAM, by pure luck because comparing Dom0 that was running >> longer and just recently restarted, i noticed there are diff lines like so: >> .. >> +dev.xbbd.202.%driver: xbbd >> +dev.xbbd.202.%location: >> +dev.xbbd.202.%parent: xenbusb_back0 >> +dev.xbbd.202.%pnpinfo: >> +dev.xbbd.202.xenbus_connection_state: Closed >> +dev.xbbd.202.xenbus_dev_type: vbd >> +dev.xbbd.202.xenbus_peer_domid: 309 >> +dev.xbbd.202.xenstore_path: backend/vbd/309/51728 >> +dev.xbbd.202.xenstore_peer_path: /local/domain/309/device/vbd/51728 >> +dev.xbbd.203.%desc: Backend Virtual Block Device >> +dev.xbbd.203.%driver: xbbd >> +dev.xbbd.203.%location: >> +dev.xbbd.203.%parent: xenbusb_back0 >> +dev.xbbd.203.%pnpinfo: >> +dev.xbbd.203.xenbus_connection_state: Closed >> .. >> although actually in system at the time there were only 2 DomUs running max. >> This seems to be the reason why this felt like memory leak, since after long >> time of start/stop VMs, most probably sysctl hit some sort of RAM limit >> (which most probably is auto-calculated from total RAM somehow) and panics. > Those dev.xbbd.XXX sysctls reference a specific blkback instance, and > should be gone once the guest is shutdown. Do you mean that you have a > range of dev.xbbd.[0-203] entries in your sysctl? Yes, i have that range of those entries. Even if all DomUs are destroyed. Snippet from logs - yesterday started, today crashed: cat 2022_03_10__1646887233_sysctl__001.log | grep xbbd | sed -Ee 's/(xbbd\.[0-9]*)\..*/\1/g' | sort | uniq dev.xbbd.%parent: dev.xbbd.0 dev.xbbd.1 dev.xbbd.10 dev.xbbd.100 dev.xbbd.101 dev.xbbd.102 dev.xbbd.103 dev.xbbd.104 dev.xbbd.105 dev.xbbd.106 dev.xbbd.107 dev.xbbd.108 dev.xbbd.109 dev.xbbd.11 .. dev.xbbd.52 dev.xbbd.520 dev.xbbd.521 dev.xbbd.522 dev.xbbd.523 dev.xbbd.524 .. dev.xbbd.99 irq804: xbbd520:133 @cpu0(domain0): 436527 irq805: xbbd521:135 @cpu0(domain0): 4777 irq806: xbbd522:137 @cpu0(domain0): 16589254 irq807: xbbd523:139 @cpu0(domain0): 6412 irq808: xbbd524:141 @cpu0(domain0): 103083 And those entries keep growing. Yesterday, i tried to follow Brian's email about working setup with FreeBSD 12.1. With given scripts 8GB RAM Dom0, still crashed, took longer, but on morning i saw that it machine rebooted, in serial logs had panic message. It seems that "kern.panic_reboot_wait_time = -1" did not disable rebooting as well. In monitoring logs i see that approx. 3 minutes after crash, testing machine was up and running again (rebooted). > Note that in dev.xbbd.XXX XXX matches the domain ID of the guest > that's using the backend. On my setups i have tested this does not seem to be the case. If each VM has more than 1 HDD, i have that many instances of xbbd sysctl variables as total count of HDDs attached to VMs. But at some point after rebooting machines or something, this number starts to grow on each create. For me it seems that dom ID is in sysctl, like this "dev.xbbd.5.xenbus_peer_domid", and if i grep those values, they show domids for nonexistent VMs. I do not know how sysctls are created, but is there some test suite that creates awfully lot of sysctl variables and tries to crash system that way? I suppose there are no ways to limit sysctl variable count or any protection mechanisms for such cases. Then i could see if panic message is the same. Or test that allocates sysctl variable with insaneley large value, or appends to it. >> Then i caught another sysctl variable that is growing due to XEN, >> "kern.msgbuf: Contents of kernel message buffer". I do not know how this >> variable grows or by which component it is managed, but in VM start/stop >> case it grows and contains lines with pattern like so: >> .. >> xnb(xnb_rxpkt2rsp:2059): Got error -1 for hypervisor gnttab_copy status >> xnb(xnb_ring2pkt:1526): Unknown extra info type 255.  Discarding packet >> xnb(xnb_dump_txreq:299): netif_tx_request index =0 >> xnb(xnb_dump_txreq:300): netif_tx_request.gref  =0 >> xnb(xnb_dump_txreq:301): netif_tx_request.offset=0 >> xnb(xnb_dump_txreq:302): netif_tx_request.flags =8 >> xnb(xnb_dump_txreq:303): netif_tx_request.id    =69 >> xnb(xnb_dump_txreq:304): netif_tx_request.size  =1000 >> xnb(xnb_dump_txreq:299): netif_tx_request index =1 >> xnb(xnb_dump_txreq:300): netif_tx_request.gref  =255 >> xnb(xnb_dump_txreq:301): netif_tx_request.offset=0 >> xnb(xnb_dump_txreq:302): netif_tx_request.flags =0 >> xnb(xnb_dump_txreq:303): netif_tx_request.id    =0 >> xnb(xnb_dump_txreq:304): netif_tx_request.size  =0 >> .. >> >> Those lines in that variable just keep growing and growing, it is not that >> they are flushed, trimmed or anything. Each time i get the same message on >> serial output, it has one more section of error appended to "same-previous" >> serial output message and sysctl variable as well. Thus at some point serial >> output and sysctl contains a large block of those errors while VM is >> starting. So at some point the value of this sysctl could be reaching max >> allowed/available and this makes the system panic.  I do not know the reason >> for those errors, but actually if there was a patch to suppress them, this >> could be "solved". Another diff chunk might be related to this: >> +dev.xnb.1.xenstore_peer_path: /local/domain/7/device/vif/0 >> +dev.xnb.1.xenbus_peer_domid: 7 >> +dev.xnb.1.xenbus_connection_state: InitWait >> +dev.xnb.1.xenbus_dev_type: vif >> +dev.xnb.1.xenstore_path: backend/vif/7/0 >> +dev.xnb.1.dump_rings: >> +dev.xnb.1.unit_test_results: xnb_rxpkt2rsp_empty:1765 Assertion Error: >> nr_reqs == 0 >> +xnb_rxpkt2rsp_empty:1767 Assertion Error: memcmp(&rxb_backup, >> &xnb_unit_pvt.rxb, sizeof(rxb_backup)) == 0 >> +xnb_rxpkt2rsp_empty:1769 Assertion Error: memcmp(&rxs_backup, >> xnb_unit_pvt.rxs, sizeof(rxs_backup)) == 0 >> +52 Tests Passed >> +1 Tests FAILED > So you have failed tests for netback. Maybe the issue is with > netback rather than blkback. I am not sure where the problem is. Maybe there are two or more problems. Should they be discussed separately network related and disk related? >> What was suspicious about this is that i am using 13.0-RELEASE-p7, it is not >> a DEV version of XEN or anything, but there are sysctl variables with "dev" >> prefix. Could it be that ports have been accidentally compiled XEN version >> with some development flags turned on? > 'dev' here means device, not developer. Okay. Understood. >> So those were my conclusions. >> >> What do you think? How should we proceed with this? Should i try to somehow >> build XEN from git sources? > I think that's unlikely to make any difference. I would think the > problem is with FreeBSD rather than Xen. Can you paste the config file > you are using the create the domain(s)? Yet again, as i said, i do not know where the problem is. My wording might be off when pointing that it's Xen, since Dom0 is FreeBSD and it panics not Xen. It's just that i do not know the boundaries and responsibilities of components clearly enough. And often by mentioning Xen i mean whole setup or FreeBSD module that interacts with Xen. > I've tried myself to trigger this crash by creating and destroying a > guest in a loop but didn't manage to trigger it. I was able to create > (and shutdown) 1328 guests successfully on a ZFS dom0 using 4GB of > RAM. No, just turning on/off single VM does not crash system. I had tried that as well before reporting bug. I had attached all /bin/sh scripts and configs that crash the system in bug report #261059 tar archive. Those scripts are really simple. I've been trying to minimize/trim everything off since then, but the simplest requirements for crash are executing 3 processes in parallel: 1) loop that creates ZFS volumes at least of size 2GB and with dd writes data into them, /dev/zero is good. I've never run out of space, thus crash is not due to exhausted space in ZFS pool. 2) loop that creates VM1 with single disk, connects through ssh to that VM and writes 3GB data in /tmp, then reboots, starts VM, removes file from /tmp, then reboots 3) loop that starts/stops VM2 with 5 disks. As for panic timing. Less RAM for Dom0, increases speed for panic, panic message is the same. For 8GB RAM on my machine i have to wait for approx. 10 hours till panic, on 4G RAM around 6 hours. On resource starved Dom0 with 2GB, even shorter, but panic messages are more or less the same. I have crashed just by omitting step 1, but it takes longer. Running just on/off for both VMs, system does not crash. Thus it seems that some sort of ZFS load (VM internal or Dom0) is essential for panic to occur. I have managed to crash with VM which has 2 HDDs, thus it does not seem to be due to 4 HDD threshold emulated/virtio disks. It's just that with 5, it takes less time till crash. Usually panic occurs after when xbbd.$value, $value > 230. > Do you have some kind of configuration that differs from the default > for ZFS for example? (ie: setting a different ARC size) I have no specific settings, everything is basically like in manual pages. # cat /etc/sysctl.conf vm.max_user_wired=-1 vm.max_wired=-1 kern.panic_reboot_wait_time=-1 vfs.zfs.vol.mode=2 # minimize problems with bug #262189 (default value still crashes system) # /etc/rc.conf cloned_interfaces="bridge0" ifconfig_bridge0=".." zfs_enable="YES" xencommons_enable="YES" ifconfig_em0=".." defaultrouter=".." sshd_* # configs sendmail_* # configs that should  not influence anything # cat /boot/loader.conf autoboot_delay="1" beastie_disable="YES" autoboot_delay="5" boot_multicons="YES" boot_serial="YES" comconsole_speed="9600" console="comconsole,vidconsole" xen_kernel="/boot/xen" xen_cmdline="dom0_mem=8192M dom0_max_vcpus=2 dom0=pvh,verbose=1 console=vga,com1 com1=9600,8n1 guest_loglvl=all loglvl=all sync_console=1 noreboot" hw.usb.no_boot_wait=1 # cat xen-vm1-zvol.conf memory = 1024 vcpus=1 name = "xen-vm1-zvol" type = "hvm" boot = "dc" hdtype="ahci" vif = [ "type=vif,bridge=bridge0" ] disk = [ "backendtype=phy, format=raw, vdev=xvda, target=/dev/zvol/zpool/vmdk/dev/vm1-root" ] vnc=1 vnclisten="0.0.0.0:7" serial = ["pty"] VM2 config fingerprint is the same, just with 5 HDDs, different names, paths, values. > Are you ballooning the amount of memory assigned to dom0? No, if ballooning does not happen by default, then not. I saw Brian's message about experienced powering off panic case, but there was nothing that i could add to that. The way i power off VMs is with command: xl shutdown -w -F $vm_name. I am intending to change testing scripts 'xl shutdown' to ssh 'init 0' and 'xl list' wait loop, to rule out possible problem cause due to 'xl shutdown' command. With given scripts/setup i can crash Xen + FreeBSD 12.1, 12.2 or 13.0 RELEASES as Dom0. Some things that i do not know to mention or not, since i can not reliably repeat them are. There are times when after a while OOM starts to kill qemu-system-i386 processes, or ssh that initialed 'xl create' command, although it seems that there is plenty RAM available; usually not long after that if testing script is started again or xl create/destroy is run manually, system crashes. Sometimes it feels that after a crash, next crash happens sooner. There was a case when 8GB RAM system crashed in approx. 8 minues after boot and running those scripts, when previously it had already crashed. But i can not think of any reason for that... can ZFS be left in dirty state if Xen crashes? To rule out that, i installed FreeBSD on separate disk(UFS) and always recreate testing pool from scratch. If there is no time for you to set up the system, i would be willing to help with that, i can drop SSH pub key, then connect at some arranged time, setup takes around 30-40 mins. Nothing much is necessary, just some HDDs to test on, ZFS pool, and FreeBSD images, either iso or prepared VMs. But other than that, i have exposed all config changes i have made, in general it's just installed the way described in manual, most of things are default. I somehow do not feel that my hardware is at fault, since i have tested on three different machines, different manufacturers. But maybe they all have some BIOS setting that is common and causes this problem; i am not sure of that.