ZFS/istgt lockup
John
jwd at FreeBSD.org
Wed Oct 3 03:27:38 UTC 2012
Hi Folks,
I've been chasing a problem that I'm not quite sure originates
on the BSD side, but the system shouldn't lock up and require a power
cycle to reboot.
The config: I have a bsd system running 9.1RC handing out a
36TB volume to a Linux RHEL 6.1 system. The RHEL 6.1 systems is
doing heavy I/O & number crunching. Many hours into the job stream
the kernel becomes quite unhappy:
kernel: __ratelimit: 27665 callbacks suppressed
kernel: swapper: page allocation failure. order:1, mode:0x4020
kernel: Pid: 0, comm: swapper Tainted: G ---------------- T 2.6.32-131.0.15.el6.x86_64 #1
kernel: Call Trace:
kernel: <IRQ> [<ffffffff811201e6>] ? __alloc_pages_nodemask+0x716/0x8b0
kernel: [<ffffffff8115464a>] ? alloc_pages_current+0xaa/0x110
kernel: [<ffffffffa0115cd5>] ? refill_fl+0x3d5/0x4a0 [cxgb3]
kernel: [<ffffffff814200bd>] ? napi_frags_finish+0x6d/0xb0
kernel: [<ffffffffa0116d33>] ? process_responses+0x653/0x1450 [cxgb3]
kernel: [<ffffffff810e7f62>] ? ring_buffer_lock_reserve+0xa2/0x160
kernel: [<ffffffffa0117b6c>] ? napi_rx_handler+0x3c/0x90 [cxgb3]
kernel: [<ffffffff814225a3>] ? net_rx_action+0x103/0x2f0
kernel: [<ffffffff8106f717>] ? __do_softirq+0xb7/0x1e0
kernel: [<ffffffff810d69d6>] ? handle_IRQ_event+0xf6/0x170
kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
kernel: [<ffffffff8106f505>] ? irq_exit+0x85/0x90
kernel: [<ffffffff814e3505>] ? do_IRQ+0x75/0xf0
kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11
kernel: <EOI> [<ffffffff810362ab>] ? native_safe_halt+0xb/0x10
kernel: [<ffffffff81100826>] ? ftrace_raw_event_power_start+0x16/0x20
kernel: [<ffffffff810142fd>] ? default_idle+0x4d/0xb0
kernel: [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110
kernel: [<ffffffff814d493c>] ? start_secondary+0x202/0x245
On the bsd side, the istgt daemon appears to see that one of the
connection threads is down and attempts to restart it. At this point,
the istgt process size starts to grow.
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 1224 0.0 0.4 8041092 405472 v0- DL 4:59PM 15:28.72 /usr/local/bin/istgt
root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 63:18.34 /usr/local/bin/istgt
root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 61:13.80 /usr/local/bin/istgt
root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 0:00.00 /usr/local/bin/istgt
There are more than 1400 threads reported.
Also of interest, netstat shows:
tcp4 0 0 10.59.6.12.5010 10.59.25.113.54076 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.33345 CLOSED
tcp4 0 0 10.59.6.12.5010 10.59.25.113.54074 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.33343 CLOSED
tcp4 0 0 10.59.6.12.5010 10.59.25.113.54072 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.33341 CLOSED
tcp4 0 0 10.60.6.12.5010 10.60.25.113.33339 CLOSED
tcp4 0 0 10.59.6.12.5010 10.59.25.113.54070 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.53806 CLOSE_WAIT
There are more than 1400 sockets in the CLOSE* state. What would
prevent these sockets from cleaning up in a reasonable timeframe?
Both sides of the mpio connection appear to be attempting reconnects.
An attempt to gracefully kill istgt fails. A kill -9 does not clean
things up either.
A procstat -kk 1224 after the kill -9 shows:
PID TID COMM TDNAME KSTACK
1224 100959 istgt sigthread mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 zio_wait+0x61 dbuf_read+0x5e5 dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_
lookup_norm+0x45 zap_lookup+0x2e zfs_dirent_lock+0x4ff zfs_dirlook+0x69 zfs_lookup+0x26b zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf8 VOP_LOOKUP_APV+0x40 lookup+0x
464 namei+0x4e9 vn_open_cred+0x3cb
1224 100960 istgt luthread #1 mi_switch+0x186 sleepq_wait+0x42 _sleep+0x376 bwait+0x64 physio+0x246 devfs_write_f+0x8d dofilewrite+0x8b kern_writev
+0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7
1224 103533 istgt sendthread #1493 mi_switch+0x186 thread_suspend_switch+0xc9 thread_single+0x1b2 exit1+0x72 sigexit+0x7c postsig+0x3a4 ast+0x26c doreti
_ast+0x1f
An attempt to forcefully export the pool hangs also. A procstat
shows:
PID TID COMM TDNAME KSTACK
4427 100991 zpool - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 dbuf_read+0x30b dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_lookup_norm+0x45 zap_lookup+0x2e dsl_dir_open_spa+0x121 dsl_dataset_hold+0x3b dmu_objset_hold+0x23 zfs_ioc_objset_stats+0x2b zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x546 Xfast_syscall+0xf7
If anyone has any ideas, please let me know. I know I've left a lot
of config information out in an attempt to keep the email shorter.
Random comments:
This happens with or without multipathd enabled on the linux client.
If I catch the istgt daemon while it's creating threads and kill it
the system will not lock up.
I see no errors in the istgt log file. One of my next things to try
is to enable all debugging... The amount of debugging data captured
is quite large :-(
I am using chelsio 10G cards on both client/server which have been
rock solid in all other cases.
Thoughts welcome!
Thanks,
John
More information about the freebsd-fs
mailing list