[ZFS] continuous write to disk by zfskern
Yoshiaki Kasahara
kasahara at nc.kyushu-u.ac.jp
Fri Sep 13 10:01:29 UTC 2013
On Fri, 13 Sep 2013 18:29:08 +0900 (JST),
Yoshiaki Kasahara <kasahara at nc.kyushu-u.ac.jp> said:
> Hello,
>
> Recently I noticed that my (zfs only) FreeBSD 9-STABLE system (for my
> main desktop) was very sluggish, and realized that zfskern was
> continuously writing something to my main raidz1 pool. By checking my
> munin record, it started just after I updated my world on Aug
> 27th. The temperature of HDD's are kept over 60C and I'm afraid the
> system is grinding the lifetime of them rapidly to death.
>
> Only the raidz pool "zroot" shows the symptom. It happens even when
> the system is in single user mode.
>
> -----
>
> % zpool iostat 1
> capacity operations bandwidth
> pool alloc free read write read write
> ---------- ----- ----- ----- ----- ----- -----
> backup2 1.36T 1.36T 0 0 4.60K 2.52K
> zroot 1.26T 1.44T 16 1.35K 238K 5.17M
> ---------- ----- ----- ----- ----- ----- -----
> backup2 1.36T 1.36T 0 0 0 0
> zroot 1.26T 1.44T 0 1.56K 0 7.96M
> ---------- ----- ----- ----- ----- ----- -----
> backup2 1.36T 1.36T 0 0 0 0
> zroot 1.26T 1.44T 0 1.58K 0 4.47M
> ---------- ----- ----- ----- ----- ----- -----
> backup2 1.36T 1.36T 0 0 0 0
> zroot 1.26T 1.44T 0 1.59K 0 4.54M
> ---------- ----- ----- ----- ----- ----- -----
> backup2 1.36T 1.36T 0 0 0 0
> zroot 1.26T 1.44T 0 1.73K 0 6.13M
> ---------- ----- ----- ----- ----- ----- -----
> ^C
> % top -mio -SH -owrite 10
> last pid: 13351; load averages: 1.04, 0.52, 0.40 up 0+00:30:32 17:55:32
> 671 processes: 9 running, 635 sleeping, 2 zombie, 25 waiting
>
> Mem: 747M Active, 235M Inact, 1383M Wired, 2100K Cache, 13G Free
> ARC: 956M Total, 258M MFU, 661M MRU, 5566K Anon, 8027K Header, 23M Other
> Swap: 24G Total, 24G Free
>
>
> PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND
> 4 root 129929 114 229 166157 0 166386 88.64% zfskern{txg_
> 1198 root 751 15 15 1621 0 1636 0.87% syslogd
> 1025 _pflogd 5642 2 1 1390 0 1391 0.74% pflogd
> 2512 kasahara 120297 1606 116 574 182 872 0.46% Xorg
> 2569 kasahara 1131 21 712 196 8 916 0.49% gconfd-2
> 2598 www 136 0 4 146 0 150 0.08% httpd
> 2570 www 192 4 12 143 0 155 0.08% httpd
> 2248 www 147 4 19 141 0 160 0.09% httpd
> 2626 kasahara 4192 91 381 131 389 901 0.48% nautilus{nau
> 4536 kasahara 32937 762 1866 128 725 2719 1.45% emacs-24.3.5
>
> -----
>
> Are there any way to see what is happening to my system? I had I
> believe the behavior started to happen when I installed the following
> kernel, but I'm not sure if it is a culprit.
>
> FreeBSD 9.2-PRERELEASE #0 r254947: Tue Aug 27 13:36:54 JST 2013
>
> I updated my world again after that (to fix vulnerabilities), so I
> don't have the previous kernel anymore. I can only remember the
> previous update was during July.
>
> My system has two pools, one is raidz consist of 3 disks, another one
> is single disk pool. Please ignore backup2's data errors (it is
> another story and I'm going to replace the disk soon).
>
> -----
>
> % zpool status
> pool: backup2
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://illumos.org/msg/ZFS-8000-8A
> scan: scrub repaired 0 in 3h37m with 9 errors on Sat Aug 31 08:34:28 2013
> config:
>
> NAME STATE READ WRITE CKSUM
> backup2 ONLINE 0 0 0
> gpt/backup2 ONLINE 0 0 0
>
> errors: 9 data errors, use '-v' for a list
>
> pool: zroot
> state: ONLINE
> scan: scrub canceled on Fri Sep 13 17:22:45 2013
> config:
>
> NAME STATE READ WRITE CKSUM
> zroot ONLINE 0 0 0
> raidz1-0 ONLINE 0 0 0
> gpt/disk0 ONLINE 0 0 0
> gpt/disk1 ONLINE 0 0 0
> gpt/disk2 ONLINE 0 0 0
>
> errors: No known data errors
>
> % grep ada /var/run/dmesg.boot
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada0: <Hitachi HUA721010KLA330 GKAOAB0A> ATA-7 SATA 2.x device
> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada0: Previously was known as ad4
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada1: <Hitachi HUA721010KLA330 GKAOAB0A> ATA-7 SATA 2.x device
> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada1: Previously was known as ad6
> ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
> ada2: <Hitachi HUA721010KLA330 GKAOAB0A> ATA-7 SATA 2.x device
> ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada2: Command Queueing enabled
> ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada2: Previously was known as ad8
> ada3 at ahcich4 bus 0 scbus4 target 0 lun 0
> ada3: <ST3000DM001-9YN166 CC4H> ATA-8 SATA 3.x device
> ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada3: Command Queueing enabled
> ada3: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
> ada3: quirks=0x1<4K>
> ada3: Previously was known as ad12
>
> % gpart list
> Geom name: ada0
> modified: false
> state: OK
> fwheads: 16
> fwsectors: 63
> last: 1953525134
> first: 34
> entries: 128
> scheme: GPT
> Providers:
> 1. Name: ada0p1
> Mediasize: 65536 (64k)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 17408
> Mode: r0w0e0
> rawuuid: 5c629775-10e5-11df-9abf-001b21525260
> rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
> label: (null)
> length: 65536
> offset: 17408
> type: freebsd-boot
> index: 1
> end: 161
> start: 34
> 2. Name: ada0p2
> Mediasize: 8589934592 (8.0G)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 82944
> Mode: r1w1e1
> rawuuid: 9f7ec4c5-10e5-11df-9abf-001b21525260
> rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
> label: swap0
> length: 8589934592
> offset: 82944
> type: freebsd-swap
> index: 2
> end: 16777377
> start: 162
> 3. Name: ada0p3
> Mediasize: 991614851584 (923G)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 82944
> Mode: r1w1e2
> rawuuid: d275a596-10e5-11df-9abf-001b21525260
> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
> label: disk0
> length: 991614851584
> offset: 8590017536
> type: freebsd-zfs
> index: 3
> end: 1953525134
> start: 16777378
> Consumers:
> 1. Name: ada0
> Mediasize: 1000204886016 (931G)
> Sectorsize: 512
> Mode: r2w2e5
>
> Geom name: ada1
> modified: false
> state: OK
> fwheads: 16
> fwsectors: 63
> last: 1953525134
> first: 34
> entries: 128
> scheme: GPT
> Providers:
> 1. Name: ada1p1
> Mediasize: 65536 (64k)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 17408
> Mode: r0w0e0
> rawuuid: 5f353ff1-10e5-11df-9abf-001b21525260
> rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
> label: (null)
> length: 65536
> offset: 17408
> type: freebsd-boot
> index: 1
> end: 161
> start: 34
> 2. Name: ada1p2
> Mediasize: 8589934592 (8.0G)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 82944
> Mode: r1w1e1
> rawuuid: a28e75bb-10e5-11df-9abf-001b21525260
> rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
> label: swap1
> length: 8589934592
> offset: 82944
> type: freebsd-swap
> index: 2
> end: 16777377
> start: 162
> 3. Name: ada1p3
> Mediasize: 991614851584 (923G)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 82944
> Mode: r1w1e2
> rawuuid: d60cd5c7-10e5-11df-9abf-001b21525260
> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
> label: disk1
> length: 991614851584
> offset: 8590017536
> type: freebsd-zfs
> index: 3
> end: 1953525134
> start: 16777378
> Consumers:
> 1. Name: ada1
> Mediasize: 1000204886016 (931G)
> Sectorsize: 512
> Mode: r2w2e5
>
> Geom name: ada2
> modified: false
> state: OK
> fwheads: 16
> fwsectors: 63
> last: 1953525134
> first: 34
> entries: 128
> scheme: GPT
> Providers:
> 1. Name: ada2p1
> Mediasize: 65536 (64k)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 17408
> Mode: r0w0e0
> rawuuid: 60568cdf-10e5-11df-9abf-001b21525260
> rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
> label: (null)
> length: 65536
> offset: 17408
> type: freebsd-boot
> index: 1
> end: 161
> start: 34
> 2. Name: ada2p2
> Mediasize: 8589934592 (8.0G)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 82944
> Mode: r1w1e1
> rawuuid: a4cfd93f-10e5-11df-9abf-001b21525260
> rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
> label: swap2
> length: 8589934592
> offset: 82944
> type: freebsd-swap
> index: 2
> end: 16777377
> start: 162
> 3. Name: ada2p3
> Mediasize: 991614851584 (923G)
> Sectorsize: 512
> Stripesize: 0
> Stripeoffset: 82944
> Mode: r1w1e2
> rawuuid: d88a0a46-10e5-11df-9abf-001b21525260
> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
> label: disk2
> length: 991614851584
> offset: 8590017536
> type: freebsd-zfs
> index: 3
> end: 1953525134
> start: 16777378
> Consumers:
> 1. Name: ada2
> Mediasize: 1000204886016 (931G)
> Sectorsize: 512
> Mode: r2w2e5
>
> Geom name: ada3
> modified: false
> state: OK
> fwheads: 16
> fwsectors: 63
> last: 5860533134
> first: 34
> entries: 128
> scheme: GPT
> Providers:
> 1. Name: ada3p1
> Mediasize: 65536 (64k)
> Sectorsize: 512
> Stripesize: 4096
> Stripeoffset: 0
> Mode: r0w0e0
> rawuuid: b995fc9e-abc3-11e2-b146-001cc0fac46a
> rawtype: 83bd6b9d-7f41-11dc-be0b-001560b84f0f
> label: (null)
> length: 65536
> offset: 20480
> type: freebsd-boot
> index: 1
> end: 167
> start: 40
> 2. Name: ada3p2
> Mediasize: 3000591450112 (2.7T)
> Sectorsize: 512
> Stripesize: 4096
> Stripeoffset: 0
> Mode: r1w1e2
> rawuuid: fcaac211-abc3-11e2-b146-001cc0fac46a
> rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
> label: backup2
> length: 3000591450112
> offset: 1048576
> type: freebsd-zfs
> index: 2
> end: 5860532223
> start: 2048
> Consumers:
> 1. Name: ada3
> Mediasize: 3000592982016 (2.7T)
> Sectorsize: 512
> Stripesize: 4096
> Stripeoffset: 0
> Mode: r1w1e3
>
> % zfs-stats -a
>
> ------------------------------------------------------------------------
> ZFS Subsystem Report Fri Sep 13 18:03:32 2013
> ------------------------------------------------------------------------
>
> System Information:
>
> Kernel Version: 902503 (osreldate)
> Hardware Platform: amd64
> Processor Architecture: amd64
>
> ZFS Storage pool Version: 5000
> ZFS Filesystem Version: 5
>
> FreeBSD 9.2-PRERELEASE #0 r255506: Fri Sep 13 16:09:51 JST 2013 root
> 6:03PM up 39 mins, 5 users, load averages: 0.21, 0.32, 0.33
>
> ------------------------------------------------------------------------
>
> System Memory:
>
> 4.74% 751.62 MiB Active, 1.49% 236.12 MiB Inact
> 8.89% 1.38 GiB Wired, 0.01% 2.05 MiB Cache
> 84.86% 13.14 GiB Free, 0.01% 1.84 MiB Gap
>
> Real Installed: 16.00 GiB
> Real Available: 99.79% 15.97 GiB
> Real Managed: 96.96% 15.48 GiB
>
> Logical Total: 16.00 GiB
> Logical Used: 16.44% 2.63 GiB
> Logical Free: 83.56% 13.37 GiB
>
> Kernel Memory: 1.10 GiB
> Data: 97.02% 1.07 GiB
> Text: 2.98% 33.62 MiB
>
> Kernel Memory Map: 15.47 GiB
> Size: 6.44% 1019.41 MiB
> Free: 93.56% 14.47 GiB
>
> ------------------------------------------------------------------------
>
> ARC Summary: (HEALTHY)
> Memory Throttle Count: 0
>
> ARC Misc:
> Deleted: 25
> Recycle Misses: 0
> Mutex Misses: 0
> Evict Skips: 744
>
> ARC Size: 15.90% 977.06 MiB
> Target Size: (Adaptive) 100.00% 6.00 GiB
> Min Size (Hard Limit): 16.67% 1.00 GiB
> Max Size (High Water): 6:1 6.00 GiB
>
> ARC Size Breakdown:
> Recently Used Cache Size: 50.00% 3.00 GiB
> Frequently Used Cache Size: 50.00% 3.00 GiB
>
> ARC Hash Breakdown:
> Elements Max: 31.89k
> Elements Current: 100.00% 31.89k
> Collisions: 107.00k
> Chain Max: 4
> Chains: 1.83k
>
> ------------------------------------------------------------------------
>
> ARC Efficiency: 1.19m
> Cache Hit Ratio: 97.73% 1.16m
> Cache Miss Ratio: 2.27% 26.98k
> Actual Hit Ratio: 92.39% 1.10m
>
> Data Demand Efficiency: 97.97% 531.13k
> Data Prefetch Efficiency: 17.48% 1.84k
>
> CACHE HITS BY CACHE LIST:
> Anonymously Used: 5.46% 63.35k
> Most Recently Used: 26.05% 302.29k
> Most Frequently Used: 68.50% 795.00k
> Most Recently Used Ghost: 0.00% 0
> Most Frequently Used Ghost: 0.00% 0
>
> CACHE HITS BY DATA TYPE:
> Demand Data: 44.83% 520.37k
> Prefetch Data: 0.03% 322
> Demand Metadata: 49.71% 576.91k
> Prefetch Metadata: 5.43% 63.04k
>
> CACHE MISSES BY DATA TYPE:
> Demand Data: 39.89% 10.76k
> Prefetch Data: 5.63% 1.52k
> Demand Metadata: 27.09% 7.31k
> Prefetch Metadata: 27.39% 7.39k
>
> ------------------------------------------------------------------------
>
> L2ARC is disabled
>
> ------------------------------------------------------------------------
>
> File-Level Prefetch: (HEALTHY)
>
> DMU Efficiency: 2.98m
> Hit Ratio: 74.29% 2.22m
> Miss Ratio: 25.71% 766.97k
>
> Colinear: 766.97k
> Hit Ratio: 0.01% 98
> Miss Ratio: 99.99% 766.87k
>
> Stride: 2.18m
> Hit Ratio: 100.00% 2.18m
> Miss Ratio: 0.00% 6
>
> DMU Misc:
> Reclaim: 766.87k
> Successes: 0.52% 4.01k
> Failures: 99.48% 762.86k
>
> Streams: 32.42k
> +Resets: 0.10% 34
> -Resets: 99.90% 32.39k
> Bogus: 0
>
> ------------------------------------------------------------------------
>
> VDEV cache is disabled
>
> ------------------------------------------------------------------------
>
> ZFS Tunables (sysctl):
> kern.maxusers 1357
> vm.kmem_size 16622653440
> vm.kmem_size_scale 1
> vm.kmem_size_min 0
> vm.kmem_size_max 329853485875
> vfs.zfs.arc_max 6442450944
> vfs.zfs.arc_min 1073741824
> vfs.zfs.arc_meta_used 246149616
> vfs.zfs.arc_meta_limit 4294967296
> vfs.zfs.l2arc_write_max 8388608
> vfs.zfs.l2arc_write_boost 8388608
> vfs.zfs.l2arc_headroom 2
> vfs.zfs.l2arc_feed_secs 1
> vfs.zfs.l2arc_feed_min_ms 200
> vfs.zfs.l2arc_noprefetch 1
> vfs.zfs.l2arc_feed_again 1
> vfs.zfs.l2arc_norw 1
> vfs.zfs.anon_size 3833856
> vfs.zfs.anon_metadata_lsize 0
> vfs.zfs.anon_data_lsize 0
> vfs.zfs.mru_size 694739968
> vfs.zfs.mru_metadata_lsize 115806208
> vfs.zfs.mru_data_lsize 529095680
> vfs.zfs.mru_ghost_size 28662784
> vfs.zfs.mru_ghost_metadata_lsize 13669888
> vfs.zfs.mru_ghost_data_lsize 14992896
> vfs.zfs.mfu_size 289340928
> vfs.zfs.mfu_metadata_lsize 26335232
> vfs.zfs.mfu_data_lsize 249390592
> vfs.zfs.mfu_ghost_size 72080896
> vfs.zfs.mfu_ghost_metadata_lsize 41472
> vfs.zfs.mfu_ghost_data_lsize 72039424
> vfs.zfs.l2c_only_size 0
> vfs.zfs.dedup.prefetch 1
> vfs.zfs.nopwrite_enabled 1
> vfs.zfs.mdcomp_disable 0
> vfs.zfs.no_write_throttle 0
> vfs.zfs.write_limit_shift 3
> vfs.zfs.write_limit_min 33554432
> vfs.zfs.write_limit_max 2142888448
> vfs.zfs.write_limit_inflated 51429322752
> vfs.zfs.write_limit_override 0
> vfs.zfs.prefetch_disable 0
> vfs.zfs.zfetch.max_streams 8
> vfs.zfs.zfetch.min_sec_reap 2
> vfs.zfs.zfetch.block_cap 256
> vfs.zfs.zfetch.array_rd_sz 1048576
> vfs.zfs.top_maxinflight 32
> vfs.zfs.resilver_delay 2
> vfs.zfs.scrub_delay 4
> vfs.zfs.scan_idle 50
> vfs.zfs.scan_min_time_ms 1000
> vfs.zfs.free_min_time_ms 1000
> vfs.zfs.resilver_min_time_ms 3000
> vfs.zfs.no_scrub_io 0
> vfs.zfs.no_scrub_prefetch 0
> vfs.zfs.mg_alloc_failures 12
> vfs.zfs.write_to_degraded 0
> vfs.zfs.check_hostid 1
> vfs.zfs.recover 0
> vfs.zfs.deadman_synctime 1000
> vfs.zfs.deadman_enabled 1
> vfs.zfs.txg.synctime_ms 1000
> vfs.zfs.txg.timeout 5
> vfs.zfs.vdev.cache.max 16384
> vfs.zfs.vdev.cache.size 0
> vfs.zfs.vdev.cache.bshift 16
> vfs.zfs.vdev.trim_on_init 1
> vfs.zfs.vdev.max_pending 10
> vfs.zfs.vdev.min_pending 4
> vfs.zfs.vdev.time_shift 29
> vfs.zfs.vdev.ramp_rate 2
> vfs.zfs.vdev.aggregation_limit 131072
> vfs.zfs.vdev.read_gap_limit 32768
> vfs.zfs.vdev.write_gap_limit 4096
> vfs.zfs.vdev.bio_flush_disable 0
> vfs.zfs.vdev.bio_delete_disable 0
> vfs.zfs.vdev.trim_max_bytes 2147483648
> vfs.zfs.vdev.trim_max_pending 64
> vfs.zfs.zil_replay_disable 0
> vfs.zfs.cache_flush_disable 0
> vfs.zfs.zio.use_uma 0
> vfs.zfs.sync_pass_deferred_free 2
> vfs.zfs.sync_pass_dont_compress 5
> vfs.zfs.sync_pass_rewrite 2
> vfs.zfs.snapshot_list_prefetch 0
> vfs.zfs.super_owner 0
> vfs.zfs.debug 0
> vfs.zfs.version.ioctl 3
> vfs.zfs.version.acl 1
> vfs.zfs.version.spa 5000
> vfs.zfs.version.zpl 5
> vfs.zfs.trim.enabled 0
> vfs.zfs.trim.txg_delay 32
> vfs.zfs.trim.timeout 30
> vfs.zfs.trim.max_interval 1
>
> ------------------------------------------------------------------------
>
> -----
>
> Please tell me if you need anything else to diagnose the problem.
>
> Regards,
> --
> Yoshiaki Kasahara
> Research Institute for Information Technology, Kyushu University
> kasahara at nc.kyushu-u.ac.jp
I noticed that one of my snapshot still holds older kernel (Aug 8th),
and even using the kernel didn't solve the problem. So maybe the
timing of the symptom is just a coincidence. Sorry for confusion.
Now I'm feeling that the dataset (ZIL maybe?) is somehow corrupted.
After I receive the new HDD for backup2 pool, I'm considering to
backup the entire contents of zroot and destroy/create zroot again
(when I have time).
Regards,
--
Yoshiaki Kasahara
Research Institute for Information Technology, Kyushu University
kasahara at nc.kyushu-u.ac.jp
More information about the freebsd-fs
mailing list