[Bug 244792] [iscsi] ctladm islist leads to kernel panic if target ctl(4) port is disabled
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Fri Mar 13 12:28:51 UTC 2020
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244792
Bug ID: 244792
Summary: [iscsi] ctladm islist leads to kernel panic if target
ctl(4) port is disabled
Product: Base System
Version: CURRENT
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs at FreeBSD.org
Reporter: aleksandr.fedorov at itglobal.com
Created attachment 212383
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=212383&action=edit
iscsi_ioctl_list panic + debug info
I found an issue which leads to kernel panic.
Test setup:
Machine 1 - ISCSI target.
Machine 2 - ISCSI initiator.
Disable ctl(4) port on target:
Machine 1# ctladm port -o off -p 3
Front End Ports disabled
After that, initiator trying to reconnect:
Machine 2# dmesg
...
(da23:iscsi4:0:0:5): Periph destroyed
(da22:iscsi4:0:0:4): Periph destroyed
(da19:iscsi4:0:0:3): Periph destroyed
(da17:iscsi4:0:0:2): Periph destroyed
WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error;
reconnecting
WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error;
reconnecting
...
If I try to list iscsi sessions on target side - kernel panics.
Machine 1# ctladm islist
Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 11
fault virtual address = 0x17c
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff831bb8c3
stack pointer = 0x28:0xfffffe01c358f780
frame pointer = 0x28:0xfffffe01c358f810
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 27739 (ctladm)
trap number = 12
panic: page fault
cpuid = 11
time = 1583839216
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c358f3e0
vpanic() at vpanic+0x185/frame 0xfffffe01c358f440
panic() at panic+0x43/frame 0xfffffe01c358f4a0
trap_fatal() at trap_fatal+0x386/frame 0xfffffe01c358f500
trap_pfault() at trap_pfault+0x99/frame 0xfffffe01c358f580
trap() at trap+0x2a7/frame 0xfffffe01c358f6b0
calltrap() at calltrap+0x8/frame 0xfffffe01c358f6b0
--- trap 0xc, rip = 0xffffffff831bb8c3, rsp = 0xfffffe01c358f780, rbp =
0xfffffe01c358f810 ---
cfiscsi_ioctl() at cfiscsi_ioctl+0x753/frame 0xfffffe01c358f810
devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe01c358f860
vn_ioctl() at vn_ioctl+0x132/frame 0xfffffe01c358f970
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe01c358f990
kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe01c358f9f0
sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe01c358fac0
amd64_syscall() at amd64_syscall+0x168/frame 0xfffffe01c358fbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe01c358fbf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004c19ba, rsp =
0x7fffffffe448, rbp = 0x7fffffffeab0 ---
KDB: enter: panic
You can see full output with debug in attachment.
The panic occurs in function cfiscsi_ioctl_list()
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1718
Due the fact that cs->cs_target pointer is NULL (see attachment).
I add some checks to prevent panic:
Machine 1# ctladm islist
ID Portal Initiator name Target name
1 192.168.101.5 iqn.1994-09.org.freebsd:q1u005.z.vstack.com
iqn.2018-11.com.vstack:target4
3 192.168.101.4 iqn.1994-09.org.freebsd:q1u004.z.vstack.com
iqn.2018-11.com.vstack:target3
4 192.168.101.3 iqn.1994-09.org.freebsd:q1u003.z.vstack.com
iqn.2018-11.com.vstack:target2
74 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none
106 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none
124 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none
130 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none
147 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none
259 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none
330 192.168.101.2 iqn.1994-09.org.freebsd:q1u002.z.vstack.com none
root at q1u001:~ # ps -l -p 0 -HSwww | grep cfiscsimt
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
0 0 0 0 -16 0 0 12656 cfiscsi DLs - 0:00.00
[kernel/cfiscsimt]
As you can see, there are many partially initialized session and corresponding
maintenance threads:
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1161.
After some investigation I found that cs->cs_target == NULL because session
doesn't terminated correctly.
A new session is created in the cfiscsi_ioctl_handoff() function:
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1490
Find target:
...
1505 ct = cfiscsi_target_find(softc, cihp->target_name,
1506 cihp->portal_group_tag);
1507 if (ct == NULL) {
1508 ci->status = CTL_ISCSI_ERROR;
1509 snprintf(ci->error_str, sizeof(ci->error_str),
1510 "%s: target not found", __func__);
1511 return;
1512 }
...
Create new session: allocate struct 'cs', start manteinance thread.
1539 cs = cfiscsi_session_new(softc, cihp->offload);
1540 if (cs == NULL) {
1541 ci->status = CTL_ISCSI_ERROR;
1542 snprintf(ci->error_str, sizeof(ci->error_str),
1543 "%s: cfiscsi_session_new failed",
__func__);
1544 cfiscsi_target_release(ct);
1545 return;
1546 }
...
Check if target port is online. In our case target port is offline
(ct->ct_online == 0).
1583 if (ct->ct_online == 0) {
1584 mtx_unlock(&softc->lock);
1585 cs->cs_handoff_in_progress = false;
Terminate session: Send cv_signal() to mantainance thread, deallocate struct,
etc.
1586 cfiscsi_session_terminate(cs);
1587 cfiscsi_target_release(ct);
1588 ci->status = CTL_ISCSI_ERROR;
1589 snprintf(ci->error_str, sizeof(ci->error_str),
1590 "%s: port offline", __func__);
1591 return;
1592 }
The main problem is that mantainance thread not always receive cv_signal() and
stuck in cv_wait().
So we have many partially initilized sessions and mantanance threads.
I see the following problems:
1. Flags cs->cs_handoff_in_progress and cs->cs_terminating which used by
mantanance thread is changed without the lock.
2. As I understand cv_signal() must be called under lock:
390 /*
391 * Signal a condition variable, wakes up one waiting thread. Will also
wakeup
392 * the swapper if the process is not in memory, so that it can bring the
393 * sleeping process in. Note that this may also result in additional
threads
394 * being made runnable. Should be called with the same mutex as was passed
to
395 * cv_wait held.
396 */
397 void
398 cv_signal(struct cv *cvp)
With next patch I can't reproduce the panic:
diff --git a/sys/cam/ctl/ctl_frontend_iscsi.c
b/sys/cam/ctl/ctl_frontend_iscsi.c
index d5be20c2a215..1b7837aa8355 100644
--- a/sys/cam/ctl/ctl_frontend_iscsi.c
+++ b/sys/cam/ctl/ctl_frontend_iscsi.c
@@ -1582,8 +1582,10 @@ cfiscsi_ioctl_handoff(struct ctl_iscsi *ci)
mtx_lock(&softc->lock);
if (ct->ct_online == 0) {
mtx_unlock(&softc->lock);
+ CFISCSI_SESSION_LOCK(cs);
cs->cs_handoff_in_progress = false;
cfiscsi_session_terminate(cs);
+ CFISCSI_SESSION_UNLOCK(cs);
cfiscsi_target_release(ct);
ci->status = CTL_ISCSI_ERROR;
snprintf(ci->error_str, sizeof(ci->error_str),
3. Why wee need at all to start the mantanance thread and than if ct->ct_online
== 0 immidiatelly destroy it.
Can we check ct->ct_online early?
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-bugs
mailing list