mlx4en, timer irq @100%...

Ben RUBSON ben.rubson at gmail.com
Fri Aug 4 17:13:33 UTC 2017


> On 04 Aug 2017, at 19:02, Hans Petter Selasky <hps at selasky.org> wrote:
> 
> On 08/04/17 18:59, Ben RUBSON wrote:
>> Hello,
>> Not sure this is the right list, but as it seems related to a mlx4en device...
>> # vmstat -i 1
>> (...)
>> interrupt                          total       rate
>> cpu23:timer                         1198       1127
>> # top -P ALL
>> (...)
>> CPU 23:  0.0% user,  0.0% nice,  0.0% system,  100% interrupt,  0.0% idle
>> # netstat -I mlxen0 -d -w 1
>>             input         mlxen0           output
>>    packets  errs idrops      bytes    packets  errs      bytes colls drops
>> (and not output at all, same for mlxen1 !)
>> # uname -sr
>> FreeBSD 11.0-RELEASE-p9
>> So, as you can see, one of my CPUs is used at 100% by timer interrupts,
>> since about 2 hours, and suddenly.
>> Initiating network connections to this server is now slow.
>> And what I found is that I can't use netstat on my 2 mlx4en devices anymore
>> (my monitoring tool is then no more fed).
>> sysctl hw.mlxen0 is OK, no errors, and trafic counters grow slowly.
>> What should I do ?
>> How to investigate on this ?
>> Thank you very much for your help & support,
> 
> Hi,
> 
> Try "procstat -ak". It should give an idea what is going on.
> 
> What version of FreeBSD is this?
> 
> Is this a regression issue?

Hi HPS, and thank you for your answer, much appreciated !
procstat log attached.

FreeBSD is 11.0-RELEASE-p9.

A regression I'm not sure, as this server has more than 30 days of uptime.
2 other identical servers with months of uptime without issue.

Ben

-------------- next part --------------
A non-text attachment was scrubbed...
Name: procstat.log
Type: application/octet-stream
Size: 128839 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20170804/6d1a4e38/attachment-0001.obj>
-------------- next part --------------




More information about the freebsd-net mailing list