From nobody Thu Nov 04 20:34:27 2021 X-Original-To: virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 8D9B61803CC4 for ; Thu, 4 Nov 2021 20:34:27 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Hlb373SXCz4pLl for ; Thu, 4 Nov 2021 20:34:27 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 56BAE29C4 for ; Thu, 4 Nov 2021 20:34:27 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 1A4KYRbK054117 for ; Thu, 4 Nov 2021 20:34:27 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 1A4KYRhb054116 for virtualization@FreeBSD.org; Thu, 4 Nov 2021 20:34:27 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: virtualization@FreeBSD.org Subject: [Bug 259651] bhyve process uses all memory/swap Date: Thu, 04 Nov 2021 20:34:27 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: bhyve X-Bugzilla-Version: 12.2-RELEASE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: reg@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: virtualization@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version rep_platform op_sys bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-virtualization@freebsd.org X-BeenThere: freebsd-virtualization@freebsd.org MIME-Version: 1.0 X-ThisMailContainsUnwantedMimeParts: N https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D259651 Bug ID: 259651 Summary: bhyve process uses all memory/swap Product: Base System Version: 12.2-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: bhyve Assignee: virtualization@FreeBSD.org Reporter: reg@FreeBSD.org I've had a Windows Server running in bhyve on FreeNAS for a few years now. = It uses DFS-R to sync a few windows file systems to my remote backup location.= =20 The VM has several zvol backed AHCI devices and a virtio network adapter. = It has been running (mostly) stably for a long time with adequate performance = (as in, it can mostly saturate the 1GB link it's on and can get disk speeds in = the VM which are as fast as I expect from the low power backing store). Recent= ly I made a few changes to the machine and the host, some of which are hard to reverse, and the VM has started to consume all available RAM, then all the = swap and eventually it gets killed by the OOM handler... A few crashes corrupted the DFS-R databases, and so now the machine wants to do a huge amount of IO (both network and disk) to resync (but that's my problem). There are other reports online of RAM exhaustion from bhyve, but I couldn't find an open bug, so I'm filing one. My problem seemed to start on updatin= g to TrueNAS-12.0-U5.1, but I also did some other reconfiguration around this ti= me, and judging from the other reports, this might be a long-standing issue. The other change I made was to mess with the CPU/RAM allocation to this VM,= and I accidentally misread the number of the cores as the total number of cores, not the per CPU cores, so I allocated way more cores as my CPU has threads (2xCPUs, 2xcores, 2xthreads, 8GB RAM)... Needless to say, the VM quickly swamped the host. However, this also caused the memory use to grow. I've = now scaled the CPUs back to (1xCPU, 1xcore, 2xthreads, 6GB RAM) and the memory = use is now staying stable - although it's currently rebuilding some DFS-R datab= ase so it's not maxing out the VM CPUs. The behavior I observe is that the memory use stays stable as long as the h= ost CPU use is reasonable. As soon as the host starts to max out its real cores (it's a 2xcore, 2xthread CPU) and the bhyve VM is doing a lot of IO, the me= mory use grows rapidly. When the byhe process is stopped (by shutting down the = VM, if you can get in quick enough), it takes a very long time to exit and sits= in a 'tx->tx' state. It looks like it's trying to flush buffers, although the zpool seems to show only reads while the process is exiting. My guess as to the bug is that byhve has a huge amount of outstanding IO, but I'm not sure= how to monitor that. When the host CPU is really busy these IO buffers are not being freed properly, and are eventually leaking. Around the same time as making these changes, I also turned on dedup on one= of the zvols (the backups on that disk are rewritten every day, even though they're the same, so I was getting a lot of snapshot growth). I've turned = that off, but it didn't seem to change the behavior. I also added the ZIL and L= 2ARC devices to the pool around this time. I've not tried removing them. The host and the VM have been set up for a long time and working, so I'm go= ing to ignore suggestions to get a bigger box or tune my zarc values... But I'm happy to debug it - I've been able to reproduce this relatively reliably wi= th different CPU settings, although it does rely on Windows cooperating. I ca= n't mess with it too much since I do need to keep the other backups going direc= tly via TrueNAS to the other pools going ;-). TrueNAS Server: ThinkServer TS140, Intel(R) Core(TM) i3-4130 CPU @ 3.40GHz, 20GB RAM. zpool: 2 striped mirrored 3TB TOSHIBA HDWD130, with mirrored 12GB ZIL and 1= 9GB L2ARC on SATA SSDs. TrueNAS-12.0-U6 (FreeBSD 12.2-RELEASE-p10), 25GB of swap. Windows Server VM: 2xCPU, 1xcore, 1xthread, 6GB RAM (original, see other comments). 4xAHCI zvol with 64K cluster, one of which had dedup on for a period as an experiment, 512B blocks. (VM BSODs immediately if I try using virtio-blk). 1xVirtIO NIC (em0), with 0.1.208 virtio-win drivers. Windows Server 2019, fully patched. --=20 You are receiving this mail because: You are the assignee for the bug.=