isp and scsi_target

Wed Apr 21 02:18:01 UTC 2010

I can provide SSH access to the hardware if you like.

Erich M. Jenkins
Fuujin Group Limited

"You should never, never doubt what no one is sure about."
-- Gene Wilder

Matthew Jacob wrote:
> What a mess.
> 
> I need to look at this in detail. The stuff was working (sort of) in 
> RELENG_8, but got very little testing otherwise.
> 
> 
> 
> 
>> We're trying to get an emulated disk to show up on 7.3-REL and not 
>> having much luck. This is a point-to-point connection with a pair of 
>> Qlogic cards (pciconf below). There is no FC switch in between the 
>> machines, and both cards were defaulted prior to testing (factory BIOS 
>> settings). The moment I rescan the bus on the initiator, the target 
>> machine panics and dumps core. The initiator hangs until the FC card 
>> on the initiator resets, then returns to the prompt (wedge??).
>>
>> Here's the card (same in both machines though different scsi bus)
>>
>> isp0 at pci0:5:1:0:        class=0x0c0400 card=0x00091077 chip=0x23001077 
>> rev=0x01 hdr=0x00
>>     vendor     = 'QLogic Corporation'
>>     device     = 'QLA2300 SANblade 2300 64-bit FC-AL Adapter'
>>     class      = serial bus
>>     subclass   = Fibre Channel
>>
>>
>> I get tons of debugging output on the target machine when launching 
>> scsi_target with the following command:
>>
>> test001# scsi_target -d 3:0:0 /usr/home/testuser/target0
>>
>> Here's a snip-it of the debugging output on the target machine after 
>> the above command (goes on for pages):
>>
>> scsi_target: sending ccb (0x332)
>> scsi_target: sending ccb (0x334)
>> scsi_target: sending ccb (0x332)
>> scsi_target: sending ccb (0x334)
>> scsi_target: main loop beginning
>>
>> Then this when the initiator rescans the bus just before it tanks:
>>
>> scsi_target: read ready
>> scsi_target: event -1 done
>> scsi_target: Working on ATIO 0x2825c200
>> scsi_target: tcmd_handle atio 0x2825c200 ctio 0x2825e0c0 atioflags 0x8000
>>
>> And this in the log on the initiator when it comes back up:
>>
>> isp0: bad pdb (110) @ handle 0x1
>> isp0: 0: hdl 0x1 PROB al1 tgt   0  TGT 0x0000e8 => UNK 0x000000; WWNN 
>> 0x200000e08b08f56d WWPN 0x210000e08b08f56d
>>
>>
>> Here's the relevant kernel info on the target:
>>
>> # ISP SCSI Controllers
>> device          isp             # Qlogic family
>> device          ispfw           # Firmware for QLogic HBAs
>> options         ISP_TARGET_MODE # Qlogic family target mode
>> device          targ
>> device          targbh
>> options         CAMDEBUG
>> options         VFS_AIO
>>
>> /boot/device.hints on the target:
>>
>> hint.isp.0.fullduplex="1"
>> hint.isp.0.topology="nport-only"
>> hint.isp.0.role="target"
>>
>> Here's the relevant kernel info on the initiator:
>>
>> # ISP SCSI Controllers
>> device          isp             # Qlogic family
>> device          ispfw           # Firmware for QLogic HBAs
>> device          targ
>> device          targbh
>> options         CAMDEBUG
>> options         VFS_AIO
>>
>> /boot/device.hints on the initiator:
>>
>> hint.isp.0.fullduplex="1"
>> hint.isp.0.topology="nport-only"
>> hint.isp.0.role="initiator"
>> hint.isp.0.iid="4"
>>
>>
>> I'm seeing this in the syslog on the initiator:
>>
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.5 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.6 (count 36, 
>> resid 36, status not marked)
>> Apr 20 22:21:28 test002 kernel: isp0: bad underrun for 0.7 (count 36, 
>> resid 36, status not marked)
>>
>>
>> Here's the bt for the core dump after the panic which looks to be 
>> pretty useless from my observation (I'd _love_ to be wrong!!):
>>
>> test001# kgdb kernel.debug /var/crash/vmcore.0
>>
>> Unread portion of the kernel message buffer:
>> (targ0:isp0:0:0:0): targdone 0xc7b7b400
>> (targ0:isp0:0:0:0): targread
>> (targ0:isp0:0:0:0): targread ccb 0xc7b7b400 (0x2825c200)
>> (targ0:isp0:0:0:0): targreturnccb 0xc7b7b400
>> cam_debug: targfreeccb descr 0xc7b80060 and
>> cam_debug: freeing ccb 0xc7b7b400
>> (targ0:isp0:0:0:0): write - uio_resid 4
>> (targ0:isp0:0:0:0): Sending queued ccb 0x933 (0x2825e0c0)
>> (targ0:isp0:0:0:0): targstart 0xc73bd400
>> (targ0:isp0:0:0:0): sendccb 0xc73bd400
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 4; apic id = 04
>> fault virtual address   = 0x4
>> fault code              = supervisor read, page not present
>> instruction pointer     = 0x20:0xc04f0a66
>> stack pointer           = 0x28:0xc6fe5900
>> frame pointer           = 0x28:0xc6fe5950
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                         = DPL 0, pres 1, def32 1, gran 1
>> processor eflags        = interrupt enabled, resume, IOPL = 0
>> current process         = 639 (scsi_target)
>> trap number             = 12
>> panic: page fault
>> cpuid = 4
>> Uptime: 51s
>> Physical memory: 3767 MB
>> Dumping 102 MB: 87 71 55 39 23 7
>>
>> Reading symbols from /boot/kernel/ispfw.ko...Reading symbols from 
>> /boot/kernel/ispfw.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/ispfw.ko
>> Reading symbols from /boot/kernel/acpi.ko...Reading symbols from 
>> /boot/kernel/acpi.ko.symbols...done.
>> done.
>> Loaded symbols for /boot/kernel/acpi.ko
>> #0  doadump () at pcpu.h:196
>> 196             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
>> (kgdb) bt
>> #0  doadump () at pcpu.h:196
>> #1  0xc05c4e87 in boot (howto=260) at 
>> /usr/src/sys/kern/kern_shutdown.c:418
>> #2  0xc05c5159 in panic (fmt=Variable "fmt" is not available.
>> ) at /usr/src/sys/kern/kern_shutdown.c:574
>> #3  0xc08258bc in trap_fatal (frame=0xc6fe58c0, eva=4) at 
>> /usr/src/sys/i386/i386/trap.c:950
>> #4  0xc0825b20 in trap_pfault (frame=0xc6fe58c0, usermode=0, eva=4) at 
>> /usr/src/sys/i386/i386/trap.c:863
>> #5  0xc08264d9 in trap (frame=0xc6fe58c0) at 
>> /usr/src/sys/i386/i386/trap.c:541
>> #6  0xc080a1db in calltrap () at /usr/src/sys/i386/i386/exception.s:166
>> #7  0xc04f0a66 in isp_pci_dmasetup (isp=0xc71de000, csio=0xc73bd400, 
>> rq=0xc6fe59c4, nxtip=0xc6fe5a0c, optr=1) at 
>> /usr/src/sys/dev/isp/isp_pci.c:2781
>> #8  0xc04e96a1 in isp_action (sim=0xc7198e00, ccb=0xc73bd400) at 
>> /usr/src/sys/dev/isp/isp_freebsd.c:1373
>> #9  0xc0449104 in xpt_run_dev_sendq (bus=0xc71d65c0) at 
>> /usr/src/sys/cam/cam_xpt.c:3894
>> #10 0xc04495ce in xpt_action (start_ccb=0xc73bd400) at 
>> /usr/src/sys/cam/cam_xpt.c:3056
>> #11 0xc0466ee6 in targsendccb (softc=0xc744ee00, ccb=0xc73bd400, 
>> descr=0xc7b80020) at /usr/src/sys/cam/scsi/scsi_target.c:787
>> #12 0xc0467027 in targstart (periph=0xc71cc700, start_ccb=0xc73bd400) 
>> at /usr/src/sys/cam/scsi/scsi_target.c:654
>> #13 0xc044dd1d in xpt_run_dev_allocq (bus=0xc71d65c0) at 
>> /usr/src/sys/cam/cam_xpt.c:3765
>> #14 0xc044e0ad in xpt_schedule (perph=0xc71cc700, new_priority=1) at 
>> /usr/src/sys/cam/cam_xpt.c:3665
>> #15 0xc04684f4 in targwrite (dev=0xc7681000, uio=0xc6fe5c60, ioflag=0) 
>> at /usr/src/sys/cam/scsi/scsi_target.c:599
>> #16 0xc0586359 in giant_write (dev=0xc7681000, uio=0xc6fe5c60, 
>> ioflag=0) at /usr/src/sys/kern/kern_conf.c:434
>> #17 0xc054cbde in devfs_write_f (fp=0xc7631b94, uio=0xc6fe5c60, 
>> cred=0xc7681600, flags=0, td=0xc7889240) at 
>> /usr/src/sys/fs/devfs/devfs_vnops.c:1446
>> #18 0xc05ff917 in dofilewrite (td=0xc7889240, fd=4, fp=0xc7631b94, 
>> auio=0xc6fe5c60, offset=-1, flags=0) at file.h:257
>> #19 0xc05ffbf8 in kern_writev (td=0xc7889240, fd=4, auio=0xc6fe5c60) 
>> at /usr/src/sys/kern/sys_generic.c:402
>> #20 0xc05ffc6f in write (td=0xc7889240, uap=0xc6fe5cfc) at 
>> /usr/src/sys/kern/sys_generic.c:318
>> #21 0xc0825e75 in syscall (frame=0xc6fe5d38) at 
>> /usr/src/sys/i386/i386/trap.c:1101
>> #22 0xc080a240 in Xint0x80_syscall () at 
>> /usr/src/sys/i386/i386/exception.s:262
>> #23 0x00000033 in ?? ()
>> Previous frame inner to this frame (corrupt stack?)
>> (kgdb)
>>
>> Platform is a pair of HP DL580-G3 servers, quad 2.8GHz Xeon CPU's with 
>> 4 gigs of ram in each (x86-32/i386, not x86-64/amd64). I've tried this 
>> with and without the device.hints options, all resulting in a core 
>> dump on the target and a hang on the initiator until the card in the 
>> target gets reset on reboot.
>>
>> Any thoughts would be great. I'd like to get a SQL server up on these 
>> FC cards. I understand I could use iSCSI, but the powers that be have 
>> requested FC.
>>
> 
> _______________________________________________
> freebsd-scsi at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-scsi
> To unsubscribe, send any mail to "freebsd-scsi-unsubscribe at freebsd.org"