[Bug 262189] ZFS volume not showing up in /dev/zvol when 1 CPU

From: <bugzilla-noreply_at_freebsd.org>
Date: Fri, 25 Feb 2022 10:02:48 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=262189

            Bug ID: 262189
           Summary: ZFS volume not showing up in /dev/zvol when 1 CPU
           Product: Base System
           Version: 13.0-RELEASE
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: zedupsys@gmail.com

I have found 100% repeatable problem on 4+ different setups, with ZFS zvol not
showing up in /dev/zvol until system reboot. In a way this is a continuation of
investigation for problem reported at
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261059, where i had my
suspicion that there are some ZFS concurrency issues.

Requirements to reproduce (as i have tested), latest BSD as of now
(13.0-RELEASE-p7 or 13.0-RELEASE) machine must have 1CPU (or 1 vCPU
!IMPORTANT!). RAM seems not to matter, i have tested with 16G, 2G, 1G setups.
Example will be for basic ZFS install, default options from DVD installer,
automatic partitioning (just next-next install). Ensure that before running
there is no ZVOLs and /dev/zvol directory does not exist (not mandatory, bug
exists even if there are zvols, just easier to detect if there aren't any).


To trigger the bug, run shell script below (adjust script preamble of zpool/zfs
dataset creation/destruction, name as necessary):

/bin/sh
name_pool=zroot/stress
# zpool create -f $name_pool /dev/ada1
# or
# zfs create $name_pool
zfs set mountpoint=none $name_pool

# zfs destroy -r $name_pool

seq 1 100 | while read i; do
zfs create -o volmode=dev -V 1G $name_pool/data$i
dd if=/dev/zero of=/dev/zvol/$name_pool/data$i bs=1M
done


You will see error output like or similar at some point in loop:
dd: /dev/zvol/zroot/stress/data1: No such file or directory


Now to validate run:
ls /dev/zvol
ls: /dev/zvol: No such file or directory

zfs list
# .. output containing all created ZVOLS ..


After reboot, zvols show up in /dev/zvol as expected (as should have been the
case after create).


More details and observations on different environments:

1.
100% reproducible inside VirtualBox 6.0 VM (default FreeBSD settings,
13.0-RELEASE, default ZFS install), 1vCPU, 1GB RAM. Zvols are created on the
same Zpool where BSD is installed.

2.
100% reproducible inside XEN 4.15 DomU, 1vCPU, 2GB RAM. FreeBSD installed on
ada0 UFS, zpool created on /dev/ada1 whole disk, zvol directly
(name_pool=zroot) without hierarchy.

3.
100% reproducible inside XEN 4.15 Dom0, 1vCPU, 16GB RAM, 13.0-RELEASE-p7.
FreeBSD installed on separate /dev/gpt/label1 (ada0) disk, Zpool on ada1 gpt
partitioned.

4.
100% reproducible on physical hardware, in BIOS disable all CPU cores, except
one. 16GB RAM, Xeon CPU.


Observations:

This bug seems to be CPU count related. If 2 CPUs are available, there will be
around 30% not created /dev/zvol devices, If 4 CPUs, then around 15% or less
(percentage calculations are not exactly calculated, but shows CPU role,
concurrency). I do not have more CPUs in my testing hardware, but it seems that
the more there are, the less probable that this bug will manifest itself.

For script part "seq 1 100", on single CPU is far too much, it is enough to be
"1 to 15" to see enough, for more CPU, higher count is better, since sometimes
all ZVOLs are created.

After restart ZVOLs are always showing up in /dev/zvol. This works for reboot
and manual import as well.

If zpool is on separate disk not where FreeBSD is installed, zpool export,
zpool import results that ZVOLs showing up in /dev/zvol.

There is no difference if ZVOL is sparse volume, created with -s flag.


On 4 CPU setup, sometimes i have noticed errors like this in serial console:
g_dev_taste: g_dev_taste(zvol/test/data22) failed to g_attach, error=6

For me this seems suspicious, since volmode=dev and g_dev_taste should not
trigger g_attach on such devices (volmode=dev), am i right? What this error
code mean, is it from errno.h, "#define ENXIO 6 /* Device not configured */"?
Maybe those are related to this bug, maybe not.

If i dd /dev/zero on block device and detach/attach it, this does not trigger
g_dev_taste error.

I have seen 6 similar reported bugs with ZVOL not showing up. Though those were
related to different (clone, send and recv) commands and seemed outdated. I
will investigate those as well, and link them in my next comments. Maybe they
have similar cause, but did not look like duplicates.

At the moment this is as far as i am able to dig onto this.

-- 
You are receiving this mail because:
You are the assignee for the bug.