amd64/161968: renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup

Peter Maloney peter.maloney at brockmann-consult.de
Mon Oct 24 15:20:01 UTC 2011


>Number:         161968
>Category:       amd64
>Synopsis:       renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    freebsd-amd64
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Oct 24 15:20:00 UTC 2011
>Closed-Date:
>Last-Modified:
>Originator:     Peter Maloney
>Release:        8.2-STABLE FreeBSD 8.2-STABLE #0: Tue Sep 27 16:27:57 CEST 2011     root at bcnastest2.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
>Organization:
Brockmann Consult
>Environment:
FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 15:06:03 CEST 2011     root at bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64

>Description:
renaming snapshot with -r including a zvol snapshot causes total ZFS freeze/lockup/deadlock. 

After it is locked up, any command using "zfs" "zpool" "sysctl -a", or NFS exports will freeze. And "shutdown -r" will not restart the system, only shut it down until it says the disks are all synced.

CTRL+T done after zfs or zpool shows state "spa_namespace_lock". Done after "sysctl -a" shows state "g_waitfor_event".

Most of the time, a simple "zfs rename" does not cause a lockup, however with a specific snapshot on one system, renaming it always causes a lockup, and on every other 8-STABLE system I have, my script always causes a lockup after a few loops.

My FreeBSD 8-STABLE was installed as 8.2 release plus the mps driver, and then cvsup using this cvsupfile (removed comments):

*default host=cvsup.de.FreeBSD.org
*default base=/var/db
*default prefix=/usr
*default release=cvs tag=RELENG_8
*default delete use-rel-suffix
*default date=2011.09.27.00.00.00
*default compress
src-all

(and the same freeze result occurs with date changed to today, Oct. 24th)

# zpool get all big
NAME  PROPERTY       VALUE       SOURCE
big   size           39.8G       -
big   capacity       24%         -
big   altroot        -           default
big   health         ONLINE      -
big   guid           14576708073682355899  default
big   version        28          default
big   bootfs         -           default
big   delegation     on          default
big   autoreplace    on          local
big   cachefile      -           default
big   failmode       continue    local
big   listsnapshots  on          local
big   autoexpand     off         default
big   dedupditto     0           default
big   dedupratio     1.00x       -
big   free           30.1G       -
big   allocated      9.64G       -
big   readonly       off         -

# zfs get all big
NAME  PROPERTY              VALUE                  SOURCE
big   type                  filesystem             -
big   creation              Thu Jul 21 11:48 2011  -
big   used                  4.80G                  -
big   available             14.7G                  -
big   referenced            4.80G                  -
big   compressratio         1.00x                  -
big   mounted               yes                    -
big   quota                 none                   default
big   reservation           none                   default
big   recordsize            128K                   default
big   mountpoint            /big                   default
big   sharenfs              off                    default
big   checksum              on                     default
big   compression           off                    default
big   atime                 on                     default
big   devices               on                     default
big   exec                  on                     default
big   setuid                on                     default
big   readonly              off                    default
big   jailed                off                    default
big   snapdir               visible                local
big   aclmode               discard                default
big   aclinherit            restricted             default
big   canmount              on                     default
big   xattr                 off                    temporary
big   copies                1                      default
big   version               4                      -
big   utf8only              off                    -
big   normalization         none                   -
big   casesensitivity       sensitive              -
big   vscan                 off                    default
big   nbmand                off                    default
big   sharesmb              off                    default
big   refquota              none                   default
big   refreservation        none                   default
big   primarycache          all                    default
big   secondarycache        all                    default
big   usedbysnapshots       0                      -
big   usedbydataset         4.80G                  -
big   usedbychildren        6.70M                  -
big   usedbyrefreservation  0                      -
big   logbias               latency                default
big   dedup                 off                    default
big   mlslabel                                     -
big   sync                  standard               default
big   refcompressratio      1.00x                  -

# zfs list
NAME                        USED  AVAIL  REFER  MOUNTPOINT
big                        4.80G  14.7G  4.80G  /big
big at testcrashsnap4             0      -  4.80G  -
zroot                      5.64G   109G   894M  legacy
zroot/tmp                  2.14M   109G  2.14M  /tmp
zroot/usr                  4.72G   109G  2.45G  /usr
zroot/usr/home             53.5K   109G  53.5K  /usr/home
zroot/usr/obj               922M   109G   922M  /usr/objtmp
zroot/usr/ports            1.07G   109G   941M  /usr/ports
zroot/usr/ports/distfiles   150M   109G   150M  /usr/ports/distfiles
zroot/usr/ports/packages     21K   109G    21K  /usr/ports/packages
zroot/usr/src               314M   109G   314M  /usr/src
zroot/var                  17.6M   109G   904K  /var
zroot/var/crash            22.5K   109G  22.5K  /var/crash
zroot/var/db               16.2M   109G  15.1M  /var/db
zroot/var/db/pkg           1.10M   109G  1.10M  /var/db/pkg
zroot/var/empty              21K   109G    21K  /var/empty
zroot/var/log               272K   109G   272K  /var/log
zroot/var/mail               48K   109G    48K  /var/mail
zroot/var/run                50K   109G    50K  /var/run
zroot/var/tmp                23K   109G    23K  /var/tmp

# cat /boot/loader.conf
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot"

/etc/sysctl.conf is nothing but comments

On a virtual machine where I have 8.2 release (not stable), I don't know how to reproduce the problem.

I also tested it on the latest downloaded with cvsup today, which freezes the same way.

All my zfs systems are amd64.


I was hoping to use a zvol for iSCSI and use snapshots, so simply avoiding using snapshots on zvols is unacceptable.
>How-To-Repeat:
Prerequisite: 

A system running 8.2-STABLE (more specifically using *default date=2011.09.27.00.00.00 in cvsup).


(1) Create a zpool.

[root at bcnastest2 ~]# zpool status big
  pool: big
 state: ONLINE
 scan: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        big           ONLINE       0     0     0
          raidz2-0    ONLINE       0     0     0
            ad8       ONLINE       0     0     0
            ad10      ONLINE       0     0     0
            ad12      ONLINE       0     0     0
            ad16      ONLINE       0     0     0
        cache
          gpt/cache0  ONLINE       0     0     0

errors: No known data errors

(2) create a zvol in the above zpool. 

[root at bcnastest2 ~]# zfs create -V 100m big/testzvol

(3) run this script as root (written in bash, works in sh too except for the count printout; make sure to set dataset variable)

#-------begin script-------
dataset=big

count=0

while true; do
    echo Snapshot
    zfs destroy -r ${dataset}@testcrashsnap >/dev/null 2>&1
    zfs snapshot -r ${dataset}@testcrashsnap || break

    current=""
    for next in 1 2 3 4 5; do
        echo Renaming from ${current} to ${next}
        zfs destroy -r ${dataset}@testcrashsnap${next} >/dev/null 2>&1
        zfs rename -r ${dataset}@testcrashsnap${current} ${dataset}@testcrashsnap${next} || break
        current=${next}
    done

    echo Destroy
    zfs destroy -r ${dataset}@testcrashsnap${current} || break
    let count++
    echo $count
done
#-------end script-------




Result: After an arbitrary number of loops, the output stops. Here is the output including result from hitting CTRL+C, CTRL+Z and Ctrl+T. The script was run on a Friday. The last line of output from Ctrl+t was done on the following Monday.

============================================
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
1
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
2
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
Renaming from 4 to 5
Destroy
3
Snapshot
Renaming from to 1
Renaming from 1 to 2
Renaming from 2 to 3
Renaming from 3 to 4
^C
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 5.56r 0.00u 0.00s 0% 1696k
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.07r 0.00u 0.00s 0% 1696k
load: 1.32  cmd: zfs 2363 [tx->tx_sync_done_cv)] 6.26r 0.00u 0.00s 0% 1696k
load: 1.46  cmd: zfs 2363 [tx->tx_sync_done_cv)] 13.42r 0.00u 0.00s 0% 1696k
^C^C^C
load: 1.89  cmd: zfs 2363 [tx->tx_sync_done_cv)] 36.59r 0.00u 0.00s 0% 1696k



^C^D


load: 0.01  cmd: zfs 2363 [tx->tx_sync_done_cv)] 230096.99r 0.00u 0.00s 0% 1696k
============================================


>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-amd64 mailing list