[vnet] [epair] epair interface stops working after some time

Tue Mar 27 19:00:04 UTC 2018

Hi,

@Kristof:
The current value of 'net.link.epair.netisr_maxqlen' is 2100, I will make it 210.
Will this require a reboot? or can I just change the sysctl and reload the epair module?

@Bjoern:
here is the output to 'netstat -Q'
```
# netstat -Q
Configuration:
Setting                        Current        Limit
Thread count                         1            1
Default queue limit                256        10240
Dispatch policy                 direct          n/a
Threads bound to CPUs         disabled          n/a

Protocols:
Name   Proto QLimit Policy Dispatch Flags
ip         1    256   flow  default   ---
igmp       2    256 source  default   ---
rtsock     3    256 source  default   ---
arp        4    256 source  default   ---
ether      5    256 source   direct   ---
ip6        6    256   flow  default   ---
epair      8   2100    cpu  default   CD-

Workstreams:
WSID CPU   Name     Len WMark   Disp'd  HDisp'd   QDrops   Queued  Handled
   0   0   ip         0    30 11409267        0        0 13574317 24983409
   0   0   igmp       0     0        0        0        0        0        0
   0   0   rtsock     0     1        0        0        0       42       42
   0   0   arp        0     0 61109751        0        0        0 61109751
   0   0   ether      0     0 115098020        0        0        0 115098020
   0   0   ip6        0    10 36157577        0        0  4273274 40430846
   0   0   epair      0  2100        0        0   210972 303785724 303785724
```

I still have access to a machine in this state, but will need to reset it to a working state soon.

Please let me know if there is any information you would like me to get from this machine before I reset it.

Best,

Reshad

On 27 March 2018 8:18:29 PM IST, "Bjoern A. Zeeb" <bzeeb-lists at lists.zabbadoz.net> wrote:
>On 27 Mar 2018, at 14:40, Kristof Provost wrote:
>
>> (Re-cc freebsd-net, because this is useful information)
>>
>> On 27 Mar 2018, at 13:07, Reshad Patuck wrote:
>>> The epair crash occurred again today running the epair module code 
>>> with the added dtrace sdt providers.
>>> 
>>> Running the same command as last time, 'dtrace -n ::epair\*:'
>returns 
>>> the following:
>>> ```
>>> CPU     ID                    FUNCTION:NAME
>> …
>>>   0  66499   epair_transmit_locked:enqueued
>>> ```
>>
>>> Looks like its filled up a queue somewhere and is dropping 
>>> connections post that.
>>> 
>>> The value of the 'error' is 55 I can see both the ifp and m structs 
>>> but don't know what to look for in them.
>>>
>> That’s useful. Error 55 is ENOBUFS, which in IFQ_ENQUEUE() means 
>> we’re hitting _IF_QFULL().
>> There don’t seem to be counters for that drop though, so that makes 
>> it hard to diagnose without these extra probe points.
>> It also explains why you don’t really see any drop counters 
>> incrementing.
>>
>> The fact that this queue is full presumably means that the other side
>
>> is not reading packets off it any more.
>> That’s supposed to happen in epair_start_locked() (Look for the 
>> IFQ_DEQUEUE() calls).
>>
>> It’s not at all clear to my how, but it looks like the receive side 
>> is not doing its work.
>>
>> It looks like the IFQ code is already a fallback for when the netisr 
>> queue is full.
>> That code might be broken, or there might be a different issue that 
>> will just mean you’ll always end up in the same situation, 
>> regardless of queue size.
>>
>> It’s probably worth trying to play with 
>> ‘net.route.netisr_maxqlen’. I’d recommend *lowering* it, to see 
>> if the problem happens more frequently that way. If it does it’ll be 
>> helpful in reproducing and trying to fix this. If it doesn’t the 
>> full queues is probably a consequence rather than a cause/trigger.
>> (Of course, once you’ve confirmed that lowering the netisr_maxqlen 
>> makes the problem more frequent go ahead and increase it.)
>
>netstat -Q  will be useful
>_______________________________________________
>freebsd-net at freebsd.org mailing list
>https://lists.freebsd.org/mailman/listinfo/freebsd-net
>To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"