stable/9 + ZFS IS NOT ready, me thinks

Thu Oct 25 15:34:13 UTC 2012

At least that is what I suspect.

As I have previously mentioned, I have five servers with stable/9
running ZFS. Four are AMD systems (similar but not identical) and the
fifth Intel. The AMD systems are the workhorses.

The AMDs have a long history of stalling under load. Specifically, the
kernel, keyboard, display, and network I/O are still there, but the
disks are stalled across all volumes, arrays, and disks (e.g., if I
enter a command not on the disks, such as on a memory disk, and
statically linked, the command will run, otherwise the command DOES NOT
run).

Over the last week I changed operating systems on two of these systems.
System #1 I downgraded to stable/8. System #3 I installed CentOS 6.3
ZFS-on-Linux (ZoL). These two systems have been running the same job
(2d17h on the first and 3d on the second) without trouble. Previously
System #1 would have within 48 hours, typically less than 12, and System
#3 would spontaneously reboot whenever I tried to send a data set via
"zfs send" to it.

On System #1 I found one of the OS disks, a hardware RAID1 array, was
toast. I found and replaced that disk before I installed 8.3. You can
argue the problem with stable/9 was that disk but I don't believe it
because I have the SAME problem across all four systems. 

When a new set of disks arrive I plan to re-introduce stable/9 to that
system to see if the faulting returns. Also, smartd says I need to
update the firmware in some of my disks, which I plan to do this weekend
(below).

Under ZoL and 8.3 the systems are more responsive than stable/9. For
example, a "ls" of the busy data set returns data MUCH more quickly
under ZoL and 8.3. Under stable/9 it sputters out the data.

Here is the current load on System #1:

mc# top
last pid: 53918;  load averages: 73.73, 73.08, 72.81    up 2+17:58:24
08:16:47
61 processes:  10 running, 51 sleeping
CPU: 11.4% user, 46.0% nice, 42.6% system,  0.1% interrupt,  0.0% idle
Mem: 702M Active, 1003M Inact, 35G Wired, 160K Cache, 88M Buf, 88G Free
ARC: 32G Total, 3594M MRU, 27G MFU, 32M Anon, 581M Header, 562M Other
Swap: 233G Total, 233G Free

mc# zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
disk-1  16.2T  6.57T  9.68T    40%  1.33x  ONLINE  -
disk-2  3.62T  3.63G  3.62T     0%  1.00x  ONLINE  -

All of the data is going onto disk-1 which had under 10GB when I started
the job.

Here is System #3, running the same job but has only 25% of the cores as
System #1:

[root at rotfl ~]# top
top - 08:19:13 up 3 days, 16:13,  7 users,  load average: 94.61, 94.57,
100.94
Tasks: 710 total,  10 running, 700 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.3%us,  4.4%sy, 82.2%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.1%si,
0.0%st
Mem:  65951592k total, 39561920k used, 26389672k free,   154372k buffers
Swap: 134217720k total,        0k used, 134217720k free,   377996k
cached

[root at rotfl ~]# zpool list
NAME     SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
disk-1  16.2T  6.72T  9.53T    41%  1.00x  ONLINE  -
disk-2  1.81T  3.24G  1.81T     0%  1.00x  ONLINE  -

Like System #1, the data is going to disk-1 which also had less than
10GB when started.

I am working on getting many TB of data off one of the remaining two
stable/9 systems for more experimentation but the system stalls, which
makes the process a bit cumbersome. I strongly suspect a contributing
factor is the system cron scripts that run at night.

Finally, as I have also previously mentioned, I am NOT the only one
having this problem. One individual stated that he did update his BIOS,
his controller firmware, and disk firmware but that didn't help.

I am happy to work with FreeBSD component knowledgeable folks but only
one stepped forward.