Deadlock after canceled zfs recv

Thu Oct 15 19:18:02 UTC 2009

I'm running the latest RELENG_8, and been doing some pre-production
stress testing.

I can deadlock (reproduceable) after a canceled ssh + zfs recv. Here's
how I reproduce the problem:

Do this in a new tank without any data in it. Reboot the system, and
make this the first zfs operations done. Files available here:

http://memberwebs.com/stef/misc/recv-snapshots-zfs-hang.tbz

Receive new file system, and then incremental snapshot:

# cat step-one | zfs recv tank/received
# cat step-two | zfs recv tank/received

At this point should look like:

# zfs list -t snapshot,filesystem | grep received
tank                   2.35G  16.4G    22K  /tank
tank/received           491M  16.4G   489M  /tank/received
tank/received at justnow  1.32M      -   160M  -
tank/received at later        0      -   489M  -

The third one goes through ssh. Count about three to five seconds (one
one thousand, two one thousand, three one thousand) and press Ctrl-C

# cat step-three | ssh localhost zfs recv tank/received

Execute the above 'zfs list' command, and more often than not, parts of
the zfs system are hung, and remain deadlocked until reboot.

If it doesn't happen the first time, try the step three + ctrl-c again.

When run through ssh, and ctrl-c cancelled it seems like 'zfs recv'
doesn't have time to do the cleanup that it normally does if run directly.

FreeBSD zfs8.ws.local 8.0-RC1 FreeBSD 8.0-RC1 #0: Wed Oct 14 16:04:50
UTC 2009     root at zfs8.ws.local:/usr/obj/usr/src/sys/GENERIC  i386

I'm available for any further information and want to help nail down
this bug.

Cheers,

Stef