zpool on Dell MD3000 causes frequent hangs
Thomas Johnson
tommyj27 at gmail.com
Fri May 22 19:10:17 UTC 2015
Hello,
I am trying to track down an ongoing issue that I've been having, and
looking for any suggestions on a possible cause, or suggestions on how I
might troubleshoot further.
The issue seems to be related to a Dell MD3000 storage array, which
contains a zpool. It seems that the host attached to the array will
occasionally hang, usually during periods of high disk activity
(annoyingly, usually about 0300).
When the system hangs, I can ping the host, and switch between virtual
consoles (but not interact with them). The system is otherwise
unresponsive; with no errors reported on the console or logs. The only
remedy I have found is to hard-reset the host.
I believe this issue is tied to the MD3000. I have tried swapping out SAS
cables, HBAs, the controller on the MD3000, and the host itself. I have
updated all the firmware I can find. Before I upgraded the host OS to
FreeBSD 10.1 (from 10.0) last month, I experienced hangs about once a
month. Since the upgrade, I have seen several events per week.
In addition to the MD3000, I have a set of USB drives that are used in a
rotation as offsite backups for the zpool. I have seen a number of hang
events during zfs send/receive transfers to the USB disk.
After the most recent hang, I removed two [consumer] SSDs from the pool
that were being used as cache devices. It is too early to tell if this
change had any impact.
Here is some of the pertinent output from the host. I can provide any other
information that would be helpful.
root at leopard:/home/tom-> uname -a
FreeBSD leopard 10.1-RELEASE-p9 FreeBSD 10.1-RELEASE-p9 #0 r281232: Tue
Apr 7 17:38:04 CDT 2015
root at cheshire-b:/pkg/base/obj_10.1-RELEASE-p9/pkg/base/src_10.1-RELEASE-p9/sys/GENERIC
amd64
root at leopard:/home/tom-> zpool list
NAME SIZE ALLOC FREE FRAG EXPANDSZ CAP DEDUP HEALTH
ALTROOT
backup 5.31T 3.61T 1.70T 22% - 68% 1.00x ONLINE -
jumpdrive_f 2.72T 2.04T 693G 30% - 75% 1.00x ONLINE -
root at leopard:/home/tom-> zpool status backup
pool: backup
state: ONLINE
scan: scrub repaired 0 in 13h15m with 0 errors on Wed May 13 16:17:29 2015
config:
NAME STATE READ WRITE CKSUM
backup ONLINE 0 0 0
da0 ONLINE 0 0 0
errors: No known data errors
root at leopard:/home/tom-> zpool get all backup
NAME PROPERTY VALUE SOURCE
backup size 5.31T -
backup capacity 68% -
backup altroot -
default
backup health ONLINE -
backup guid 12638712474922952450
default
backup version -
default
backup bootfs -
default
backup delegation on
default
backup autoreplace off
default
backup cachefile -
default
backup failmode wait
default
backup listsnapshots off
default
backup autoexpand off
default
backup dedupditto 0
default
backup dedupratio 1.00x -
backup free 1.70T -
backup allocated 3.61T -
backup readonly off -
backup comment -
default
backup expandsize 0 -
backup freeing 0
default
backup fragmentation 22% -
backup leaked 0
default
backup feature at async_destroy enabled local
backup feature at empty_bpobj active local
backup feature at lz4_compress active local
backup feature at multi_vdev_crash_dump enabled local
backup feature at spacemap_histogram active local
backup feature at enabled_txg active local
backup feature at hole_birth active local
backup feature at extensible_dataset enabled local
backup feature at embedded_data active local
backup feature at bookmarks enabled local
backup feature at filesystem_limits enabled local
More information about the freebsd-fs
mailing list