[Bug 244792] [iscsi] ctladm islist leads to kernel panic if target ctl(4) port is disabled

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Fri Mar 13 12:28:51 UTC 2020


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244792

            Bug ID: 244792
           Summary: [iscsi] ctladm islist leads to kernel panic if target
                    ctl(4) port is disabled
           Product: Base System
           Version: CURRENT
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs at FreeBSD.org
          Reporter: aleksandr.fedorov at itglobal.com

Created attachment 212383
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=212383&action=edit
iscsi_ioctl_list panic + debug info

I found an issue which leads to kernel panic.

Test setup:

Machine 1 - ISCSI target.
Machine 2 - ISCSI initiator.

Disable ctl(4) port on target:
Machine 1# ctladm port -o off -p 3
Front End Ports disabled

After that, initiator trying to reconnect:

Machine 2# dmesg
...
(da23:iscsi4:0:0:5): Periph destroyed
(da22:iscsi4:0:0:4): Periph destroyed
(da19:iscsi4:0:0:3): Periph destroyed
(da17:iscsi4:0:0:2): Periph destroyed
WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error;
reconnecting
WARNING: 192.168.101.1 (iqn.2018-11.com.vstack:target1): connection error;
reconnecting
...

If I try to list iscsi sessions on target side - kernel panics.

Machine 1# ctladm islist

Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 11
fault virtual address   = 0x17c
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff831bb8c3
stack pointer           = 0x28:0xfffffe01c358f780
frame pointer           = 0x28:0xfffffe01c358f810
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 27739 (ctladm)
trap number             = 12
panic: page fault
cpuid = 11
time = 1583839216
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01c358f3e0
vpanic() at vpanic+0x185/frame 0xfffffe01c358f440
panic() at panic+0x43/frame 0xfffffe01c358f4a0
trap_fatal() at trap_fatal+0x386/frame 0xfffffe01c358f500
trap_pfault() at trap_pfault+0x99/frame 0xfffffe01c358f580
trap() at trap+0x2a7/frame 0xfffffe01c358f6b0
calltrap() at calltrap+0x8/frame 0xfffffe01c358f6b0
--- trap 0xc, rip = 0xffffffff831bb8c3, rsp = 0xfffffe01c358f780, rbp =
0xfffffe01c358f810 ---
cfiscsi_ioctl() at cfiscsi_ioctl+0x753/frame 0xfffffe01c358f810
devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe01c358f860
vn_ioctl() at vn_ioctl+0x132/frame 0xfffffe01c358f970
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe01c358f990
kern_ioctl() at kern_ioctl+0x295/frame 0xfffffe01c358f9f0
sys_ioctl() at sys_ioctl+0x15c/frame 0xfffffe01c358fac0
amd64_syscall() at amd64_syscall+0x168/frame 0xfffffe01c358fbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe01c358fbf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8004c19ba, rsp =
0x7fffffffe448, rbp = 0x7fffffffeab0 ---
KDB: enter: panic

You can see full output with debug in attachment.

The panic occurs in function cfiscsi_ioctl_list()
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1718

Due the fact that cs->cs_target pointer is NULL (see attachment).
I add some checks to prevent panic:

Machine 1# ctladm islist
  ID Portal           Initiator name                       Target name          
   1 192.168.101.5    iqn.1994-09.org.freebsd:q1u005.z.vstack.com
iqn.2018-11.com.vstack:target4      
   3 192.168.101.4    iqn.1994-09.org.freebsd:q1u004.z.vstack.com
iqn.2018-11.com.vstack:target3      
   4 192.168.101.3    iqn.1994-09.org.freebsd:q1u003.z.vstack.com
iqn.2018-11.com.vstack:target2      
  74 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none          
 106 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none          
 124 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none          
 130 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none          
 147 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none          
 259 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none          
 330 192.168.101.2    iqn.1994-09.org.freebsd:q1u002.z.vstack.com none

root at q1u001:~ # ps -l -p 0 -HSwww | grep cfiscsimt                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt]                                                              
  0   0    0   0 -16  0   0 12656 cfiscsi    DLs   -      0:00.00
[kernel/cfiscsimt] 

As you can see, there are many partially initialized session and corresponding
maintenance threads:
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1161.

After some investigation I found that cs->cs_target == NULL because session
doesn't terminated correctly.

A new session is created in the cfiscsi_ioctl_handoff() function:
https://svnweb.freebsd.org/base/head/sys/cam/ctl/ctl_frontend_iscsi.c?revision=358333&view=markup#l1490

Find target:
...
1505            ct = cfiscsi_target_find(softc, cihp->target_name,
1506                cihp->portal_group_tag);
1507            if (ct == NULL) {
1508                    ci->status = CTL_ISCSI_ERROR;
1509                    snprintf(ci->error_str, sizeof(ci->error_str),
1510                        "%s: target not found", __func__);
1511                    return;
1512            }
...

Create new session: allocate struct 'cs', start manteinance thread.

1539                    cs = cfiscsi_session_new(softc, cihp->offload);
1540                    if (cs == NULL) {
1541                            ci->status = CTL_ISCSI_ERROR;
1542                            snprintf(ci->error_str, sizeof(ci->error_str),
1543                                "%s: cfiscsi_session_new failed",
__func__);
1544                            cfiscsi_target_release(ct);
1545                            return;
1546                    }
...

Check if target port is online. In our case target port is offline
(ct->ct_online == 0).

1583            if (ct->ct_online == 0) {
1584                    mtx_unlock(&softc->lock);
1585                    cs->cs_handoff_in_progress = false;

Terminate session: Send cv_signal() to mantainance thread, deallocate struct,
etc.

1586                    cfiscsi_session_terminate(cs);
1587                    cfiscsi_target_release(ct);
1588                    ci->status = CTL_ISCSI_ERROR;
1589                    snprintf(ci->error_str, sizeof(ci->error_str),
1590                        "%s: port offline", __func__);
1591                    return;
1592            }

The main problem is that mantainance thread not always receive cv_signal() and
stuck in cv_wait().
So we have many partially initilized sessions and mantanance threads.

I see the following problems:

1. Flags cs->cs_handoff_in_progress and cs->cs_terminating which used by
mantanance thread is changed without the lock.

2. As I understand cv_signal() must be called under lock:

390 /*
391  * Signal a condition variable, wakes up one waiting thread.  Will also
wakeup
392  * the swapper if the process is not in memory, so that it can bring the
393  * sleeping process in.  Note that this may also result in additional
threads
394  * being made runnable.  Should be called with the same mutex as was passed
to
395  * cv_wait held.
396  */
397 void
398 cv_signal(struct cv *cvp)

With next patch I can't reproduce the panic:

diff --git a/sys/cam/ctl/ctl_frontend_iscsi.c
b/sys/cam/ctl/ctl_frontend_iscsi.c
index d5be20c2a215..1b7837aa8355 100644
--- a/sys/cam/ctl/ctl_frontend_iscsi.c
+++ b/sys/cam/ctl/ctl_frontend_iscsi.c
@@ -1582,8 +1582,10 @@ cfiscsi_ioctl_handoff(struct ctl_iscsi *ci)
        mtx_lock(&softc->lock);
        if (ct->ct_online == 0) {
                mtx_unlock(&softc->lock);
+               CFISCSI_SESSION_LOCK(cs);
                cs->cs_handoff_in_progress = false;
                cfiscsi_session_terminate(cs);
+               CFISCSI_SESSION_UNLOCK(cs);
                cfiscsi_target_release(ct);
                ci->status = CTL_ISCSI_ERROR;
                snprintf(ci->error_str, sizeof(ci->error_str),

3. Why wee need at all to start the mantanance thread and than if ct->ct_online
== 0 immidiatelly destroy it.
Can we check ct->ct_online early?

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-bugs mailing list