ZFS/istgt lockup
John
jwd at FreeBSD.org
Fri Oct 5 13:13:56 UTC 2012
Copying this reply to -scsi. Not sure if it's more of a zfs issue
or istgt... more below...
----- John's Original Message -----
> Hi Folks,
>
> I've been chasing a problem that I'm not quite sure originates
> on the BSD side, but the system shouldn't lock up and require a power
> cycle to reboot.
>
> The config: I have a bsd system running 9.1RC handing out a
> 36TB volume to a Linux RHEL 6.1 system. The RHEL 6.1 systems is
> doing heavy I/O & number crunching. Many hours into the job stream
> the kernel becomes quite unhappy:
>
> kernel: __ratelimit: 27665 callbacks suppressed
> kernel: swapper: page allocation failure. order:1, mode:0x4020
> kernel: Pid: 0, comm: swapper Tainted: G ---------------- T 2.6.32-131.0.15.el6.x86_64 #1
> kernel: Call Trace:
> kernel: <IRQ> [<ffffffff811201e6>] ? __alloc_pages_nodemask+0x716/0x8b0
> kernel: [<ffffffff8115464a>] ? alloc_pages_current+0xaa/0x110
> kernel: [<ffffffffa0115cd5>] ? refill_fl+0x3d5/0x4a0 [cxgb3]
> kernel: [<ffffffff814200bd>] ? napi_frags_finish+0x6d/0xb0
> kernel: [<ffffffffa0116d33>] ? process_responses+0x653/0x1450 [cxgb3]
> kernel: [<ffffffff810e7f62>] ? ring_buffer_lock_reserve+0xa2/0x160
> kernel: [<ffffffffa0117b6c>] ? napi_rx_handler+0x3c/0x90 [cxgb3]
> kernel: [<ffffffff814225a3>] ? net_rx_action+0x103/0x2f0
> kernel: [<ffffffff8106f717>] ? __do_softirq+0xb7/0x1e0
> kernel: [<ffffffff810d69d6>] ? handle_IRQ_event+0xf6/0x170
> kernel: [<ffffffff8100c2cc>] ? call_softirq+0x1c/0x30
> kernel: [<ffffffff8100df05>] ? do_softirq+0x65/0xa0
> kernel: [<ffffffff8106f505>] ? irq_exit+0x85/0x90
> kernel: [<ffffffff814e3505>] ? do_IRQ+0x75/0xf0
> kernel: [<ffffffff8100bad3>] ? ret_from_intr+0x0/0x11
> kernel: <EOI> [<ffffffff810362ab>] ? native_safe_halt+0xb/0x10
> kernel: [<ffffffff81100826>] ? ftrace_raw_event_power_start+0x16/0x20
> kernel: [<ffffffff810142fd>] ? default_idle+0x4d/0xb0
> kernel: [<ffffffff81009e96>] ? cpu_idle+0xb6/0x110
> kernel: [<ffffffff814d493c>] ? start_secondary+0x202/0x245
>
> On the bsd side, the istgt daemon appears to see that one of the
> connection threads is down and attempts to restart it. At this point,
> the istgt process size starts to grow.
>
> USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
> root 1224 0.0 0.4 8041092 405472 v0- DL 4:59PM 15:28.72 /usr/local/bin/istgt
> root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 63:18.34 /usr/local/bin/istgt
> root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 61:13.80 /usr/local/bin/istgt
> root 1224 0.0 0.4 8041092 405472 v0- IL 4:59PM 0:00.00 /usr/local/bin/istgt
>
> There are more than 1400 threads reported.
>
> Also of interest, netstat shows:
>
> tcp4 0 0 10.59.6.12.5010 10.59.25.113.54076 CLOSE_WAIT
> tcp4 0 0 10.60.6.12.5010 10.60.25.113.33345 CLOSED
> tcp4 0 0 10.59.6.12.5010 10.59.25.113.54074 CLOSE_WAIT
> tcp4 0 0 10.60.6.12.5010 10.60.25.113.33343 CLOSED
> tcp4 0 0 10.59.6.12.5010 10.59.25.113.54072 CLOSE_WAIT
> tcp4 0 0 10.60.6.12.5010 10.60.25.113.33341 CLOSED
> tcp4 0 0 10.60.6.12.5010 10.60.25.113.33339 CLOSED
> tcp4 0 0 10.59.6.12.5010 10.59.25.113.54070 CLOSE_WAIT
> tcp4 0 0 10.60.6.12.5010 10.60.25.113.53806 CLOSE_WAIT
>
> There are more than 1400 sockets in the CLOSE* state. What would
> prevent these sockets from cleaning up in a reasonable timeframe?
> Both sides of the mpio connection appear to be attempting reconnects.
>
> An attempt to gracefully kill istgt fails. A kill -9 does not clean
> things up either.
>
> A procstat -kk 1224 after the kill -9 shows:
>
> PID TID COMM TDNAME KSTACK
> 1224 100959 istgt sigthread mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 zio_wait+0x61 dbuf_read+0x5e5 dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_
> lookup_norm+0x45 zap_lookup+0x2e zfs_dirent_lock+0x4ff zfs_dirlook+0x69 zfs_lookup+0x26b zfs_freebsd_lookup+0x81 vfs_cache_lookup+0xf8 VOP_LOOKUP_APV+0x40 lookup+0x
> 464 namei+0x4e9 vn_open_cred+0x3cb
> 1224 100960 istgt luthread #1 mi_switch+0x186 sleepq_wait+0x42 _sleep+0x376 bwait+0x64 physio+0x246 devfs_write_f+0x8d dofilewrite+0x8b kern_writev
> +0x6c sys_write+0x64 amd64_syscall+0x546 Xfast_syscall+0xf7
> 1224 103533 istgt sendthread #1493 mi_switch+0x186 thread_suspend_switch+0xc9 thread_single+0x1b2 exit1+0x72 sigexit+0x7c postsig+0x3a4 ast+0x26c doreti
> _ast+0x1f
>
>
> An attempt to forcefully export the pool hangs also. A procstat
> shows:
>
> PID TID COMM TDNAME KSTACK
> 4427 100991 zpool - mi_switch+0x186 sleepq_wait+0x42 _cv_wait+0x121 dbuf_read+0x30b dmu_buf_hold+0xe0 zap_lockdir+0x58 zap_lookup_norm+0x45 zap_lookup+0x2e dsl_dir_open_spa+0x121 dsl_dataset_hold+0x3b dmu_objset_hold+0x23 zfs_ioc_objset_stats+0x2b zfsdev_ioctl+0xe6 devfs_ioctl_f+0x7b kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x546 Xfast_syscall+0xf7
>
>
>
> If anyone has any ideas, please let me know. I know I've left a lot
> of config information out in an attempt to keep the email shorter.
>
> Random comments:
>
> This happens with or without multipathd enabled on the linux client.
>
> If I catch the istgt daemon while it's creating threads and kill it
> the system will not lock up.
>
> I see no errors in the istgt log file. One of my next things to try
> is to enable all debugging... The amount of debugging data captured
> is quite large :-(
>
> I am using chelsio 10G cards on both client/server which have been
> rock solid in all other cases.
>
> Thoughts welcome!
>
> Thanks,
> John
Hi Folks,
I've managed to replicate this problem once. Basically, it appears
the linux client sends an abort which is processed here:
istgt_iscsi_op_task:
switch (function) {
case ISCSI_TASK_FUNC_ABORT_TASK:
ISTGT_LOG("ABORT_TASK\n");
SESS_MTX_LOCK(conn);
rc = istgt_lu_clear_task_ITLQ(conn, conn->sess->lu, lun,
ref_CmdSN);
SESS_MTX_UNLOCK(conn);
if (rc < 0) {
ISTGT_ERRLOG("LU reset failed\n");
}
istgt_clear_transfer_task(conn, ref_CmdSN);
break;
At this point, the queue depth is 62. There appears to be one thread
in the zfs code performing a read.
No other processing occurs after this point. A zfs list hangs. The
pool cannot be exported. The istgt daemon cannot be fully killed. A
reboot requires a power reset (ie: reboot hangs after flushing buffers).
The only thing that does appear to be happening is a growing list
of connections:
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56577 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56576 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56575 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56574 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56573 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56572 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56571 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56570 CLOSE_WAIT
tcp4 0 0 10.60.6.12.5010 10.60.25.113.56569 CLOSE_WAIT
Currently, about 390 and slowly going up. This implies to me that there
is some sort of reconnect ocurring that is failing.
On the client side, I think the problem is related to a Chelsio N320
10G nic which is showing RX overflows. After showing about 40000 overflows
the ABORT was received on the server side. I've never seen a chelsio card
have overflow problems. The server is using the same model chelsio card
with no issues.
Again, any thoughts/comments are welcome!
Thanks,
John
More information about the freebsd-scsi
mailing list