[Bug 215634] zfs receive trips up and live-locks for non-incremental fs

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Wed Dec 28 15:30:03 UTC 2016


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=215634

            Bug ID: 215634
           Summary: zfs receive trips up and live-locks for
                    non-incremental fs
           Product: Base System
           Version: 10.3-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: freebsd-bugs at FreeBSD.org
          Reporter: johannes at jo-t.de
                CC: freebsd-amd64 at FreeBSD.org
                CC: freebsd-amd64 at FreeBSD.org

Hi,

when I'm trying to zfs-send a filesystem from one machine to another,
the receiving end gets stuck with zfskern spinning one CPU core.
No observable problems sending incremental streams.

Here's what top shows (truncated) on the receiver:

> last pid:  2848;  load averages:  1.08,  1.08,  1.04
> 243 processes: 5 running, 220 sleeping, 18 waiting
> CPU:  0.0% user,  0.0% nice, 50.6% system,  0.0% interrupt, 49.4% idle
> Mem: 8904K Active, 77M Inact, 551M Wired, 5310M Free
> ARC: 284M Total, 104M MFU, 46M MRU, 2448K Anon, 1979K Header, 131M Other
> Swap: 8192M Total, 8192M Free
>   PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
>     5 root        -8    -     0K   128K CPU0    0 252:00 100.00% zfskern{solthread 0xffff}
>    11 root       155 ki31     0K    32K RUN     1 225:09  91.89% idle{idle: cpu1}
>    11 root       155 ki31     0K    32K RUN     0  51:02   7.86% idle{idle: cpu0}
>     0 root       -16    -     0K  2496K swapin  1   0:18   0.00% kernel{swapper}
>    16 root        16    -     0K    16K syncer  1   0:16   0.00% syncer
>    12 root       -92    -     0K   288K WAIT    1   0:09   0.00% intr{irq257: virtio_p}
>    12 root       -60    -     0K   288K WAIT    1   0:07   0.00% intr{swi4: clock}
>    15 root       -16    -     0K    16K vlruwt  1   0:02   0.00% vnlru
>     6 root       -16    -     0K    32K psleep  1   0:02   0.00% pagedaemon{pagedaemon}
>    14 root       -16    -     0K    16K RUN     1   0:02   0.00% rand_harvestq
>     5 root        -8    -     0K   128K tx->tx  1   0:02   0.00% zfskern{txg_thread_enter}
>  1806 root        40    0 44420K  3692K rwa.cv  1   0:01   0.00% zfs

And here's procstat for zfskern on the receiver:

> #procstat -kk 5
>   PID    TID COMM             TDNAME           KSTACK                       
>     5 100044 zfskern          arc_reclaim_thre mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e arc_reclaim_thread+0x2be fork_exit+0x9a fork_trampoline+0xe 
>     5 100045 zfskern          arc_user_evicts_ mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e arc_user_evicts_thread+0x17d fork_exit+0x9a fork_trampoline+0xe 
>     5 100046 zfskern          l2arc_feed_threa mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e l2arc_feed_thread+0xc73 fork_exit+0x9a fork_trampoline+0xe 
>     5 100322 zfskern          trim seppel      mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e trim_thread+0x126 fork_exit+0x9a fork_trampoline+0xe 
>     5 100334 zfskern          txg_thread_enter mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x17d txg_quiesce_thread+0x16b fork_exit+0x9a fork_trampoline+0xe 
>     5 100335 zfskern          txg_thread_enter mi_switch+0xe1 sleepq_timedwait+0x3a _cv_timedwait_sbt+0x19e txg_sync_thread+0x160 fork_exit+0x9a fork_trampoline+0xe 
>     5 100425 zfskern          solthread 0xffff <running>

The sender is running (custom trimmed-down GENERIC kernel):
> FreeBSD XXX 10.3-STABLE FreeBSD 10.3-STABLE #1 r308740: Sat Nov 19 21:15:27 GMT 2016     root at XXX:/usr/obj/usr/src/sys/XXX  amd64

And the receiver is running (a differently trimmed GENERIC kernel):
> FreeBSD YYY 10.3-RELEASE-p15 FreeBSD 10.3-RELEASE-p15 #9 r310507: Sat Dec 24 21:22:15 UTC 2016     root at XXX:/path/usr/src/sys/YYY  amd64

The file systems that zfs-send just fine are clones of snapshots. These
are send as incremental streams. The problematic one is a fresh
zfs-create'd file system, with only a few small files in it.

The command used to send/receive, initiated from the sender-side, is:
> /sbin/zfs send -v -R senderpool/myrootfs | /usr/bin/gzip | /usr/bin/ssh root@${HOST} "/usr/bin/gunzip | /sbin/zfs recv -v -ud receiverpool"

And what I thus get in the console (from zfs recv -v) is (trimmed):
> found clone origin receiverpool/base at 20.46-r310507
> receiving incremental stream of senderpool/myrootfs at clean-install into receiverpool/myrootfs at clean-install
> received 886KB stream in 3 seconds (295KB/sec)
> receiving full stream of senderpool/myrootfs/freshfs at clean-install into receiverpool/myrootfs/freshfs at clean-install
> [...stuck here...]

The receiving end seems to be running fine with zfskern spinning. But it
will never finish the the filesystem in question.


Any ideas what might be going on, or what to do about it?



Thanks,

Johannes

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the freebsd-amd64 mailing list