Machine stops for some seconds with ZFS
Attila Nagy
bra at fsn.hu
Wed Feb 3 09:49:01 UTC 2010
Hello,
After a long time, I've switched back to ZFS on my desktop. It runs
8-STABLE/amd64 with two SATA disks and an USB pendrive.
One-one partition is used from each disk for the zpool, which is
encrypted using GELI, and the pendrive is there for L2ARC:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror ONLINE 0 0 0
ad0s1d.eli ONLINE 0 0 0
ad1s1d.eli ONLINE 0 0 0
cache
da0 ONLINE 0 0 0
Today, after 12 days of uptime the machine has frozen. I could ping it
from a different machine, even could open a telnet to its ssh port, but
I couldn't get the ssh banner.
Now I'm building a 9-CURRENT kernel and world to see whether the same
problem persists with that, and during the make process I've noticed a
strange thing.
I build with -j4 (the machine has one dual core CPU), so the fans are
screaming during the process. But every few minutes (I couldn't
recognize any patterns in it) the machine goes completely silent (even
more silent than normally), and everything halts.
During this, the top running on the machine can refresh itself, and I
can type on pass through ssh connections (that is, I use the machine in
question to access other machines with ssh), but I can't open new ssh
connections to it, and can't start anything new (for example from an
open shell).
ping is running seamlessly during this, and top shows the following:
last pid: 36503; load averages: 1.59, 3.04, 3.01 up 0+00:49:53
10:32:10
97 processes: 1 running, 96 sleeping
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free
Swap: 4096M Total, 4096M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
1342 root 1 44 0 3204K 620K select 0 0:02 0.00% make
1424 root 1 44 0 3204K 1036K select 0 0:01 0.00% make
1280 root 1 44 0 12540K 1900K select 0 0:01 0.00%
hald-addon-storage
1234 haldaemon 1 44 0 24116K 4464K select 0 0:01 0.00% hald
93600 root 1 44 0 3204K 1028K select 0 0:00 0.00% make
1260 root 1 44 0 19704K 2688K select 0 0:00 0.00%
hald-addon-mouse-sy
15142 bra 1 44 0 9332K 2864K CPU0 0 0:00 0.00% top
1263 root 1 44 0 12540K 1896K cgticb 0 0:00 0.00%
hald-addon-storage
94415 bra 1 44 0 37944K 4992K select 1 0:00 0.00% sshd
35837 root 1 44 0 5252K 2424K select 1 0:00 0.00% make
95361 bra 1 44 0 37944K 4992K select 1 0:00 0.00% sshd
35973 root 1 44 0 3204K 1772K select 0 0:00 0.00% make
608 root 1 44 0 6892K 1436K select 1 0:00 0.00% syslogd
96928 root 1 44 0 3204K 728K select 0 0:00 0.00% make
94369 root 1 51 0 37944K 4584K sbwait 0 0:00 0.00% sshd
82631 root 1 50 0 37944K 4584K sbwait 0 0:00 0.00% sshd
16304 root 1 44 0 37944K 4576K zio->i 1 0:00 0.00% sshd
951 _ntp 1 44 0 6876K 1692K select 0 0:00 0.00% ntpd
1238 root 1 76 0 16768K 2372K select 0 0:00 0.00%
hald-runner
4916 root 1 44 0 3204K 728K select 1 0:00 0.00% make
95338 root 1 49 0 37944K 4584K sbwait 1 0:00 0.00% sshd
1259 root 1 44 0 10280K 2712K pause 1 0:00 0.00% csh
33357 bra 1 44 0 21596K 4004K select 0 0:00 0.00% ssh
16405 bra 1 44 0 37944K 5012K zio->i 0 0:00 0.00% sshd
1044 root 1 44 0 9104K 1796K kqread 0 0:00 0.00% master
34765 root 1 76 0 8260K 1764K wait 1 0:00 0.00% sh
82685 bra 1 44 0 37944K 4960K select 1 0:00 0.00% sshd
1065 postfix 1 44 0 9100K 1872K kqread 0 0:00 0.00% qmgr
1237 root 17 44 0 27460K 4124K waitvt 0 0:00 0.00%
console-kit-daemon
95362 bra 1 44 0 10216K 2612K ttyin 0 0:00 0.00% bash
34764 root 1 44 0 3204K 852K select 0 0:00 0.00% make
1222 root 1 49 0 21672K 1896K wait 0 0:00 0.00% login
35728 root 1 44 0 3204K 860K select 0 0:00 0.00% make
1064 postfix 1 44 0 9104K 1772K zio->i 1 0:00 0.00% pickup
82696 bra 1 44 0 10216K 2596K wait 0 0:00 0.00% bash
94417 bra 1 44 0 10216K 2596K wait 1 0:00 0.00% bash
35455 root 1 44 0 3204K 744K select 0 0:00 0.00% make
35774 root 1 44 0 3204K 728K select 1 0:00 0.00% make
16409 bra 1 44 0 10216K 2592K ttyin 0 0:00 0.00% bash
1155 root 1 44 0 7948K 1604K nanslp 0 0:00 0.00% cron
1077 messagebus 1 53 0 8092K 2060K select 0 0:00 0.00%
dbus-daemon
1149 root 1 44 0 26012K 3960K select 1 0:00 0.00% sshd
35729 root 1 76 0 8260K 1760K wait 0 0:00 0.00% sh
4921 root 1 57 0 8260K 1748K wait 0 0:00 0.00% sh
825 root 1 76 0 39212K 2372K lockf 1 0:00 0.00%
saslauthd
35460 root 1 76 0 8260K 1748K wait 0 0:00 0.00% sh
34761 root 1 48 0 8260K 1740K wait 1 0:00 0.00% sh
96923 root 1 50 0 8260K 1740K wait 0 0:00 0.00% sh
As you can see, top reports that the machine is 100% idle, while a make
-j4 buildworld runs. This lasts for few seconds (10-20), then everything
goes back to normal, the fans start to scream, the build continues and I
can use the machine.
This occasional halt is new to me -but I'm just switched to ZFS on my
desktop, in a server it's harder to notice if you don't use it for
interactive sessions-, but I could see the final freeze on more than one
servers.
How could I help to debug this, and the final one?
Thanks,
More information about the freebsd-fs
mailing list