POWER9 NICs failing at 100Gbps
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 02 May 2023 17:45:22 UTC
Hello Everyone, We've been testing FreeBSD 13.2 PowerPC64LE with an LC922 and a Raptor with 100Gbps Chelsio T6 and Mellanox ConnectX-6 NICs, but we get NIC failures once we saturate either NIC. We can trigger this bug instantly with a few iperf3 instances running simultaneously. I've included the log below for the Chelsio NIC and I'm wondering if this is a known issue? cc0: link state changed to UP t6nex0: command 0x16 in mbox 4 timed out (0x4014c010). t6nex0: mbox 4 cmdsent 16a0094400000001 2328f70000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 t6nex0: mbox 4 current 16a0094400000001 2328f70000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 t6nex0: encountered fatal error, adapter stopped (1). cc0: set_rxmode (1) failed: 60 t6nex0: CIM debug regs1 00000000 00000000 00000000 00000000 00000000 t6nex0: CIM debug regs2 00000000 00000000 00000000 00000000 00330000 t6nex0: CIM LA dump follows. Status Inst Data PC LS0Stat LS0Addr LS0Data LS1Stat LS1Addr LS1Data 3c 00003003 1fffeedf 1fffeedf 00a00028 1fff0850 1fff3400 00b00020 1ffce2e8 00000000 3c 00003008 1fffeee2 1fffeee2 00a00028 1fff06a4 1ffce200 00b00020 1ffce2e8 00000000 3c 00003008 1fffeeea 1fffeeea 00a00028 1fff084c 1fff2f0c 00b00020 1ffce2e8 00000000 3c 00003008 1fffeef2 1fffeef2 00a00020 1fff084c 00000000 00b00020 1ffce2e8 00000000 3c 00003002 1fffeefa 1fffeefa 00a00020 1fff084c 00000000 00b00020 1ffce2e8 00000000 3c 00003002 1fffeefc 1fffeefc 00a00020 1fff084c 00000000 00b00020 1ffce2e8 00000000 3c 00003008 1fffeefe 1fffeefe 00a00005 1fff328b 0000000f 00b00025 1ffce2e8 00000000 .... t6nex0: device log follows. .... 46 2968294087 NOTICE PORT port[0:0x11:0x0b]: l1cfg, 1G/10G can't be advertised for this port type. mcaps 0x339f007e acaps 0x20970078 rcaps 0xb3007e 47 2968386457 INFO PORT port_link_state_handler[0] powering up 48 2968386460 INFO PORT port[0] update (flowcid 40236 rc 0) 49 2968685971 INFO PORT bean_fsm[0] : state START (count = 1) 50 2968695782 INFO PORT hw_mac_init_port[0], ptype 0x11, speed 0x4, lanes 0xf, fec 0x800000 51 2968696059 INFO PORT bean_fsm[0] : entering state BASEP_HANDLE 52 2969235973 INFO PORT bean_fsm[0] : entering state NXP_HANDLE 53 2969245973 INFO PORT bean_fsm[0] : entering state EXT_NXP_HANDLE 54 2969255973 INFO PORT consortium_fec[0]: local 0x7, remote 0x3, negotiated 0x800000 55 2969255973 INFO PORT bean_fsm[0] : entering state WAIT_FOR_NULL_PAGE 56 2969285973 INFO PORT bean_fsm[0] : entering state WAIT_COMPLETE 57 2969285974 INFO PORT bean_fsm[0] : tech ability local 0x710, remote 0x715 cr-s 0, local fec_ability 0x1 58 2969285974 INFO PORT bean_fsm[0] : IEEE speed 0x40, FEC remote 0x4, negotiated 0x800000 59 2969285975 INFO PORT bean_fsm[0] : state DONE 60 2969285976 INFO PORT bean_fsm[0] : FEC local 0x1, negotiated 0x800000 61 2969286976 INFO PORT hw_mac_init_port[0], ptype 0x11, speed 0x40, lanes 0xf, fec 0x800000 62 2969287972 INFO PORT port[0] negotiated speed 0x40, lanes 0xf:0xf, fec 0x800000 63 2969287974 INFO PORT aec_fsm[0] : state START (sigdet 0xf) 64 2969288111 INFO PORT aec_fsm[0] : transitioning to TRAINING 65 2969651045 INFO PORT aec_fsm[0] : TRAINING_COMPLETE 66 2969651046 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 67 2969651046 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 68 2969651047 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 69 2969651047 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 70 2969651905 INFO PORT aec_fsm[0] : Remote fault while waiting for link status 0x29 71 2975239314 INFO PORT aec_fsm[0]: aec training completed, link timed out lstatus 0x5 72 2975239314 INFO PORT aec_fsm[0] Link timed out after training complete, Link Status 0x5 73 2975335992 INFO PORT bean_fsm[0] : state START (count = 1) 74 2975345863 INFO PORT hw_mac_init_port[0], ptype 0x11, speed 0x4, lanes 0xf, fec 0x800000 75 2975346140 INFO PORT bean_fsm[0] : entering state BASEP_HANDLE 76 2975415994 INFO PORT bean_fsm[0] : entering state NXP_HANDLE 77 2975425994 INFO PORT bean_fsm[0] : entering state EXT_NXP_HANDLE 78 2975435994 INFO PORT consortium_fec[0]: local 0x7, remote 0x3, negotiated 0x800000 79 2975435994 INFO PORT bean_fsm[0] : entering state WAIT_FOR_NULL_PAGE 80 2975465994 INFO PORT bean_fsm[0] : entering state WAIT_COMPLETE 81 2975465995 INFO PORT bean_fsm[0] : tech ability local 0x710, remote 0x715 cr-s 0, local fec_ability 0x1 82 2975465995 INFO PORT bean_fsm[0] : IEEE speed 0x40, FEC remote 0x4, negotiated 0x800000 83 2975465996 INFO PORT bean_fsm[0] : state DONE 84 2975465996 INFO PORT bean_fsm[0] : FEC local 0x1, negotiated 0x800000 85 2975466997 INFO PORT hw_mac_init_port[0], ptype 0x11, speed 0x40, lanes 0xf, fec 0x800000 86 2975467993 INFO PORT port[0] negotiated speed 0x40, lanes 0xf:0xf, fec 0x800000 87 2975467994 INFO PORT aec_fsm[0] : state START (sigdet 0xf) 88 2975468131 INFO PORT aec_fsm[0] : transitioning to TRAINING 89 2975837289 INFO PORT aec_fsm[0] : TRAINING_COMPLETE 90 2975837289 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 91 2975837290 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 92 2975837290 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 93 2975837291 INFO PORT aec_fsm[0] : COEFFICIENT TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75 94 2975838184 INFO PORT aec_fsm[0] : Remote fault while waiting for link status 0x29 95 2981015970 INFO PORT hw_mac_link_status[0] int_cause 0x17011b4, link_status 0x22 96 2981015970 INFO PORT aec_fsm[0] : Remote fault cleared while waiting for link status 0x22 97 2981015973 INFO PORT aec_fsm[0] : DONE 98 2981015973 INFO PORT bean/aec complete (retry: 1) 99 2981015974 INFO PORT port_hss_sigdet[0]: hss_sigdet changed to 0xf 100 2981106013 INFO PORT port[0] link up (1) (speed 0x40 acaps 0x20970078 lpcaps 0x10007e) 101 2981106015 INFO PORT port[0] set PAUSE PARAMS: pppen 0 txpe 0 rxpe 0 102 2981106018 INFO PORT port[0] update (flowcid 40236 rc 0) Best, Ali