coredump when loading cxgb after boot with routing daemon already running (RELENG11)

Wed Jan 4 19:15:42 UTC 2017

On 1/4/2017 2:07 PM, Navdeep Parhar wrote:
> What source line in releng-11 does ifioctl+0x6dd correspond to?
> 
> (kgdb) l *(ifioctl+0x6dd)
> 
> This might be race where the ifnet is being created or coming up and
> zebra pokes it in some way before it's fully ready.  If that's the
> case it could affect any ifnet.

Hi Navdeep,
	Thanks for looking. yes, I just tried it with igb and a similar panic.

igb0: <Intel(R) PRO/1000 Network Connection, Version - 2.5.3-k> port
0xc000-0xc01f mem 0xf7200000-0xf727ffff,0xf7280000-0xf7283fff irq 17 at
device 0.0 on pci4
igb0: Using MSIX interrupts with 5 vectors
igb0:
Ethernet address: 00:25:90:47:b5:d8

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x0
fault code              = supervisor read instruction, page not present
instruction pointer     = 0x20:0x0
stack pointer           = 0x28:0xfffffe085d4d1728
frame pointer           = 0x28:0xfffffe085d4d1750
igb0: code segment              = base 0x0, limit 0xfffff, type 0x1b
Bound queue 0 to cpu 0
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 846 (zebra)
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0xffffffff806efae7 at kdb_backtrace+0x67
#1 0xffffffff806a6006 at vpanic+0x186
#2 0xffffffff806a5e73 at panic+0x43
#3 0xffffffff80989622 at trap_fatal+0x322
#4 0xffffffff809897ec at trap_pfault+0x1bc
#5 0xffffffff80988ea0 at trap+0x280
#6 0xffffffff8096dab1 at calltrap+0x8
#7 0xffffffff807aa79d at ifioctl+0x6dd
#8 0xffffffff8070d876 at kern_ioctl+0x346
#9 0xffffffff8070d47f at sys_ioctl+0x13f
#10 0xffffffff80989fae at amd64_syscall+0x50e
#11 0xffffffff8096dd9b at Xfast_syscall+0xfb
Uptime: 1m9s
Dumping 1267 out of 32675
MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91%
Dump complete

kgdb)  l *(ifioctl+0x6dd)
0xffffffff807b90fd is in ifioctl (/usr/src/sys/net/if.c:2655).
2650            case SIOCGIFMEDIA:
2651            case SIOCGIFXMEDIA:
2652            case SIOCGIFGENERIC:
2653                    if (ifp->if_ioctl == NULL)
2654                            return (EOPNOTSUPP);
2655                    error = (*ifp->if_ioctl)(ifp, cmd, data);
2656                    break;
2657
2658            case SIOCSIFLLADDR:
2659                    error = priv_check(td, PRIV_NET_SETLLADDR);
Current language:  auto; currently minimal
(kgdb)

> 
> Regards,
> Navdeep
> 
> 
> 
> On Wed, Jan 4, 2017 at 11:00 AM, Mike Tancsa <mike at sentex.net> wrote:
>> I ran into a strange problem when manually loading a network driver
>> after RELENG_11 box starts up with a routing daemon already running.
>>
>> If I have zebra running (just a few static routes) and then try and do a
>> kldload if_cxgb, the box panics.  If I boot the box, load the nic's
>> driver and then start zebra, all is fine.
>>
>> At first, I thought it might be a firmware issue, but I updated the
>> NIC's firmware and the same behaviour.  Not sure if this is specific to
>> the chelsio or if any kldload of a NIC driver will do.
>>
>>
>>
>> cxgbc0: <Chelsio T310, 1 port> mem
>> 0xf7081000-0xf7081fff,0xf6800000-0xf6ffffff,0xf7080000-0xf7080fff irq 16
>> at device 0.0 on pci5
>> cxgbc0: PCIe x4 Link, expect reduced performance
>> cxgbc0: using MSI-X interrupts (5 vectors)
>> cxgbc0: firmware needs to be updated to version 7.11.0
>> cJan  4 13:03:02 xgbc0: Firmware Version 5.0.0
>> cxgb0: <Port 0 10GBASE-SR> on cxgbc0
>> cxgb0: Using defaults for TSO: 65518/35/2048
>> cxgb0:
>> Ethernet address: 00:07:43:07:9e:14
>>
>> offsite2 kernel:Fatal trap 12: page fault while in kernel mode
>> c found old FW mipuinor version(5.0)d =, driver compile 2; d for version
>> 7.apic11
>>  id = 04
>> fault virtual address   = 0x0
>> fault code              = supervisor read instruction, page not present
>> instruction pointer     = 0x20:0x0
>> stack pointer           = 0x28:0xfffffe085d2df728
>> frame pointer           = 0x28:0xfffffe085d2df750
>> code segment            = base 0x0, limit 0xfffff, type 0x1b
>>                         = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags        = interrupt enabled, resume, IOPL = 0
>> current process         = 420 (zebra)
>> trap number             = 12
>> panic: page fault
>> cpuid = 0
>> KDB: stack backtrace:
>> #0 0xffffffff806fe447 at kdb_backtrace+0x67
>> #1 0xffffffff806b4966 at vpanic+0x186
>> #2 0xffffffff806b47d3 at panic+0x43
>> #3 0xffffffff80997f82 at trap_fatal+0x322
>> #4 0xffffffff8099814c at trap_pfault+0x1bc
>> #5 0xffffffff80997800 at trap+0x280
>> #6 0xffffffff8097c411 at calltrap+0x8
>> #7 0xffffffff807b90fd at ifioctl+0x6dd
>> #8 0xffffffff8071c1d6 at kern_ioctl+0x346
>> #9 0xffffffff8071bddf at sys_ioctl+0x13f
>> #10 0xffffffff8099890e at amd64_syscall+0x50e
>> #11 0xffffffff8097c6fb at Xfast_syscall+0xfb
>> Uptime: 3m9s
>> Dumping 1635 out of 32675
>> MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
>> --
>> -------------------
>> Mike Tancsa, tel +1 519 651 3400
>> Sentex Communications, mike at sentex.net
>> Providing Internet services since 1994 www.sentex.net
>> Cambridge, Ontario Canada   http://www.tancsa.com/
> 
> 

-- 
-------------------
Mike Tancsa, tel +1 519 651 3400
Sentex Communications, mike at sentex.net
Providing Internet services since 1994 www.sentex.net
Cambridge, Ontario Canada   http://www.tancsa.com/