ZFS deadlock?
Bengt Ahlgren
bengta at sics.se
Mon Aug 18 08:20:55 UTC 2014
Bengt Ahlgren <bengta at sics.se> writes:
> During a copy (zfs send/recv) of a ~1TB dataset from one zpool to
> another, my system seems to run into some issues. A simultaneous "find"
> on the source data set deadlocks. This is the kernel stack:
>
> $ procstat -kk 1786
> PID TID COMM TDNAME KSTACK
> 1786 101344 find - mi_switch+0x194 sleepq_wait+0x42 _cv_wait+0x112 zio_wait+0x61 dbuf_read+0x619 dmu_buf_hold+0xe0 zap_get_leaf_byblk+0x4a zap_deref_leaf+0x68 fzap_cursor_retrieve+0xe7 zap_cursor_retrieve+0x155 zfs_freebsd_readdir+0x2d8 VOP_READDIR_APV+0x78 kern_getdirentries+0x212 sys_getdirentries+0x23 amd64_syscall+0x5ea Xfast_syscall+0xf7
>
> The zfs send/recv has gotten very slow, albeit seems to make very slow
> progress (copy is, as obvious, from p0 to p2):
>
> p0 15.9T 2.20T 318 0 10.2M 0
> p1 11.1T 7.00T 0 0 0 0
> p2 2.55T 41.0T 0 0 0 0
> ---------- ----- ----- ----- ----- ----- -----
> p0 15.9T 2.20T 294 0 9.29M 0
> p1 11.1T 7.00T 0 0 0 0
> p2 2.55T 41.0T 0 0 0 0
> ---------- ----- ----- ----- ----- ----- -----
> p0 15.9T 2.20T 307 0 9.12M 0
> p1 11.1T 7.00T 0 0 0 0
> p2 2.55T 41.0T 0 0 0 0
> ---------- ----- ----- ----- ----- ----- -----
> p0 15.9T 2.20T 293 0 8.69M 0
> p1 11.1T 7.00T 0 0 0 0
> p2 2.55T 41.0T 0 58 0 1.61M
> ---------- ----- ----- ----- ----- ----- -----
> p0 15.9T 2.20T 301 0 10.9M 0
> p1 11.1T 7.00T 0 0 0 0
> p2 2.55T 41.0T 0 1.62K 0 49.6M
> ---------- ----- ----- ----- ----- ----- -----
>
> The machine is otherwise quite idle. When the copy started, I got
> around 200MB/s, now it's around 10MB/s.
>
> The ARC has gotten large, but that is likely normal:
>
> last pid: 1863; load averages: 0.20, 0.33, 0.63 up 0+02:27:44 16:31:52
> 50 processes: 1 running, 49 sleeping
> CPU: 0.0% user, 0.0% nice, 0.2% system, 0.0% interrupt, 99.8% idle
> Mem: 1688M Active, 61M Inact, 107G Wired, 3288K Cache, 126M Buf, 15G Free
> ARC: 99G Total, 2483M MFU, 89G MRU, 33M Anon, 888M Header, 7427M Other
> Swap: 128G Total, 128G Free
>
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> 1229 root 1 20 0 39700K 3292K piperd 7 24:27 1.07% zfs
> 1228 root 2 20 0 39832K 3420K nanslp 5 17:02 0.39% zfs
> ...
>
> The source pool is pretty filled up, can that be an issue?
>
> $ zpool list
> NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
> p0 18.1T 15.9T 2.20T 87% 1.00x ONLINE -
> p1 18.1T 11.1T 7.00T 61% 1.00x ONLINE -
> p2 43.5T 2.53T 41.0T 5% 1.00x ONLINE -
>
> The machine is running 9.3-REL and has two mps controllers.
>
> Any ideas?
Just for the record: there was no deadlock after all. It turned out to
be caused by a directory with ~4.5M entries.
Bengt
More information about the freebsd-stable
mailing list