isp driver + clustered NetApp failover = strangeness
Tim Spencer
tspencer at hungry.com
Sun May 29 14:09:58 PDT 2005
Hey there!
I've got a pair of NetApp 940c heads that are exporting LUNs out
to a bunch of FreeBSD hosts with qla2312 cards in them over a Brocade
2850 FC switch. Everything works great until I test out standby
cluster failover on the NetApps. To quote NetApp's manual:
"Port A on each target HBA operates as the active port, and Port B
operates as a standby port. When the cluster is in normal operation,
Port A provides access to local LUNs, and Port B is not available to
the initiator. When one filer fails, Port B on the partner filer
becomes active and provides access to the LUNs on the failed filer.
The Port B assumes the WWPN of the Port A on the failed partner."
So, to me, it sounds like this _should_ work for our FreeBSD
hosts, which don't support multipathing, and thus must use this sort
of failover. When the failover happens, the WWPN moves over to port
B on the other head, perhaps a link reset happens or something, and
everything keeps going. Well, it turns out that this is only partly
true. If there is no I/O happening during the swap, then everything
does seem to work out fine. But if there is I/O going on, then
things quickly go downhill. I see this:
May 28 19:35:56 toc2-db1 /kernel: (da0:isp0:0:1:0): Invalidating pack
May 28 19:35:58 toc2-db1 /kernel: (da0:isp0:0:1:0): Invalidating pack
May 28 19:36:50 toc2-db1 /kernel: (da0:isp0:0:1:0): isp0: watchdog
timeout for handle 0x1f3
After this, sometimes the system locks up completely, and
sometimes the system is operational, but anything that has to do with
the filesystem in question hangs, etc.
So here's my question: Is this something that we can make
work? I really don't know all that much about the lower levels of
how Fibre-Channel and the isp driver work, but it sounds like this
ought to work. Is there anybody out there who knows more about the
driver who might be willing to work on this? I can't guarantee
anything, but our company does support FreeBSD development, and we
might be able to swing some cash towards somebody who would be able
to make this work. Is there anything else that I can include to help
figure out what is going wrong? Below, I include dmesg from one of
the hosts so you can see what sort of system is running this, but if
you've got more things that I can do to diagnose this, let me know.
Thanks, and have fun!
-tspencer
: toc2-db2 []$; dmesg
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights
reserved.
FreeBSD 4.11-STABLE #0: Wed May 25 05:39:38 GMT 2005
root@:/usr/src/sys/compile/BSD4.11.GODSPEED-SMP
Timecounter "i8254" frequency 1193182 Hz
CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2786.13-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0xf29 Stepping = 9
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Hyperthreading: 2 logical CPUs
real memory = 3221094400 (3145600K bytes)
avail memory = 3134447616 (3060984K bytes)
Changing APIC ID for IO APIC #0 from 0 to 8 on chip
Changing APIC ID for IO APIC #1 from 0 to 9 on chip
Changing APIC ID for IO APIC #2 from 0 to 10 on chip
Programming 16 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
Programming 16 pins in IOAPIC #1
Programming 16 pins in IOAPIC #2
FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs
cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000
cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000
cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000
cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000
io0 (APIC): apic id: 8, version: 0x000f0011, at 0xfec00000
io1 (APIC): apic id: 9, version: 0x000f0011, at 0xfec01000
io2 (APIC): apic id: 10, version: 0x000f0011, at 0xfec02000
Preloaded elf kernel "kernel" at 0x9f3d2000.
Warning: Pentium 4 CPU: PSE disabled
Pentium Pro MTRR support enabled
md0: Malloc disk
Using $PIR table, 9 entries at 0x9f0fc410
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 3 -> irq 2
IOAPIC #1 intpin 7 -> irq 7
IOAPIC #1 intpin 11 -> irq 10
pci0: <PCI bus> on pcib0
pci0: <unknown card> (vendor=0x1028, dev=0x000c) at 4.0 irq 2
pci0: <unknown card> (vendor=0x1028, dev=0x0008) at 4.1 irq 7
pci0: <unknown card> (vendor=0x1028, dev=0x000d) at 4.2 irq 10
pci0: <ATI Mach64-GR graphics accelerator> at 14.0
atapci0: <ServerWorks CSB5 ATA100 controller> port 0x8b0-0x8bf,
0x8d8-0x8db,0x8d0-0x8d7,0x8c8-0x8cb,0x8c0-0x8c7 at device 15.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: <OHCI USB controller> at 15.2 irq 5
isab0: <PCI to ISA bridge (vendor=1166 device=0225)> at device 15.3
on pci0
isa0: <ISA bus> on isab0
pcib1: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 4 -> irq 11
pci1: <PCI bus> on pcib1
fxp0: <Intel 82550 Pro/100 Ethernet> port 0xdcc0-0xdcff mem
0xfcf00000-0xfcf1ffff,0xfcf20000-0xfcf20fff irq 11 at device 8.0 on pci1
fxp0: Ethernet address 00:0e:0c:62:9e:17
inphy0: <i82555 10/100 media interface> on miibus0
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib2: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 8 -> irq 13
pci2: <PCI bus> on pcib2
isp0: <Qlogic ISP 2312 PCI FC-AL Adapter> port 0xcc00-0xccff mem
0xfcd00000-0xfcd00fff irq 13 at device 6.0 on pci2
isp0: bad execution throttle of 0- using 16
pcib3: <Host to PCI bridge> on motherboard
IOAPIC #1 intpin 12 -> irq 16
IOAPIC #1 intpin 13 -> irq 17
pci3: <PCI bus> on pcib3
bge0: <Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002> mem
0xfcb10000-0xfcb1ffff irq 16 at device 6.0 on pci3
bge0: Ethernet address: 00:11:43:34:7b:3f
miibus1: <MII bus> on bge0
brgphy0: <BCM5703 10/100/1000baseTX PHY> on miibus1
brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
bge1: <Broadcom BCM5703 Gigabit Ethernet, ASIC rev. 0x1002> mem
0xfcb00000-0xfcb0ffff irq 17 at device 8.0 on pci3
bge1: Ethernet address: 00:11:43:34:7b:40
miibus2: <MII bus> on bge1
brgphy1: <BCM5703 10/100/1000baseTX PHY> on miibus2
brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX,
1000baseTX-FDX, auto
pcib4: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
IOAPIC #1 intpin 14 -> irq 18
pci4: <PCI bus> on pcib4
pcib8: <PCI to PCI bridge (vendor=8086 device=0309)> at device 8.0 on
pci4
pci5: <PCI bus> on pcib8
aac0: <Dell PERC 3/Di> mem 0xf0000000-0xf7ffffff irq 18 at device 8.1
on pci4
aac0: i960RX 100MHz, 118MB cache memory, optional battery present
aac0: Kernel 2.8-0, Build 6089, S/N 74a1d3
aac0: Supported
Options=275c<WCACHE,DATA64,HOSTTIME,WINDOW4GB,SOFTERR,NORECOND,SGMAP64>
pcib5: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci6: <PCI bus> on pcib5
pcib6: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci7: <PCI bus> on pcib6
pcib7: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci8: <PCI bus> on pcib7
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,
0xc9800-0xcd7ff,0xcd800-0xcefff,0xec000-0xeffff on isa0
pmtimer0 on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0
intpin 2
APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0
IP packet filtering initialized, divert disabled, rule-based
forwarding enabled, default to accept, logging limited to 100 packets/
entry by default
ata0-slave: ATAPI identify retries exceeded
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
acd0: CDROM <TEAC CD-ROM CD-224E> at ata0-master PIO4
aacd0: <RAID 0/1> on aac0
aacd0: 139997MB (286714368 sectors)
Mounting root from ufs:/dev/aacd0s1a
da0 at isp0 bus 0 target 0 lun 0
da0: <NETAPP LUN 0.2> Fixed Direct Access SCSI-4 device
da0: 200.000MB/s transfers, Tagged Queueing Enabled
da0: 817152MB (1673527296 512 byte sectors: 255H 63S/T 38636C)
WARNING: / was not properly dismounted
bge0: gigabit link up
ohci0: <OHCI (generic) USB controller> mem 0xfe100000-0xfe100fff irq
5 at device 15.2 on pci0
usb0: OHCI version 1.0, legacy support
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: 4 ports with 4 removable, self powered
: toc2-db2 []$;
More information about the freebsd-scsi
mailing list