[Bug 276575] Host can cause a crash in bhyve nvme emulation

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 24 Jan 2024 00:23:06 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276575

            Bug ID: 276575
           Summary: Host can cause a crash in bhyve nvme emulation
           Product: Base System
           Version: 14.0-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: bhyve
          Assignee: virtualization@FreeBSD.org
          Reporter: dpy@pobox.com

Created attachment 247908
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=247908&action=edit
output of windows minidump analysis

Hello,

OS: 14.0-RELEASE-p4

I have windows-10 vms which will BSOD upon heavy disk load on the guest or
host.

2 different cases. Both running 3 disks (all nvme emulated). 1 boot disk (c:)
and two data disks (3tb and 1 tb), backed as image files on zfs. sync is set to
always and a small optane zil is used on the pool. I am using vm-bhyve to
manage the vms.

Pool setup:
# zpool list -v data_pool
NAME                          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG   
CAP  DEDUP    HEALTH  ALTROOT
data_pool                    10.9T  3.95T  6.96T        -         -     1%   
36%  1.00x    ONLINE  -
  mirror-0                   10.9T  3.95T  6.96T        -         -     1% 
36.2%      -    ONLINE
    gpt/data_pool_00         10.9T      -      -        -         -      -     
-      -    ONLINE
    gpt/data_pool_01         10.9T      -      -        -         -      -     
-      -    ONLINE
logs                             -      -      -        -         -      -     
-      -         -
  gpt/data_pool_zil            32G  3.17M  31.5G        -         -     0% 
0.00%      -    ONLINE
cache                            -      -      -        -         -      -     
-      -         -
  gpt/data_pool_cache_0_ssd   932G  55.9G   876G        -         -     0% 
6.00%      -    ONLINE
  gpt/data_pool_cache_1_ssd   932G  53.9G   878G        -         -     0% 
5.78%      -    ONLINE

cache drives replace recently, so haven't filled up yet.

First case could not complete a full backup of the 3tb data drive (a windows
backup). The time of failure would occur after the machine had been running the
backup for a while (2-3 minutes +).  These would run at 200+ MB/s (via a 10gb
network) to the backup machine. BSOD resulted and minidumps produced indicating
NVME issues. I can supply details if required.

Second case was a little worse, since the VM was fairly quiet, but I was
testing an nfs connection (sending over a 10gb network), running at 400+ MB . 
In this case the 2nd drive on the windows VM just stopped working.  Upon reboot
the disk image had been corrupted to the point of needing reformatting to
function. (I rollbacked to a previous snapshot and reapplied the days
transactions).

The data set has sync=always (optane zil) and the vm dataset is on a zraid1
dataset (3+1).

-- 
You are receiving this mail because:
You are the assignee for the bug.