svn commit: r341578 - head/sys/dev/mlx5/mlx5_en

Mon Dec 17 14:25:56 UTC 2018

On 12/5/18 9:20 AM, Slava Shwartsman wrote:
> Author: slavash
> Date: Wed Dec  5 14:20:57 2018
> New Revision: 341578
> URL: https://urldefense.proofpoint.com/v2/url?u=https-3A__svnweb.freebsd.org_changeset_base_341578&d=DwIDaQ&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=Ed-falealxPeqc22ehgAUCLh8zlZbibZLSMWJeZro4A&m=BFp2c_-S0jnzRZJF2APwvTwmnmVFcyjcnBvHRZ3Locc&s=b7fvhOzf_b5bMVGquu4SaBhMNql5N8dVPAvpfKtz53Q&e=
> 
> Log:
>    mlx5en: Remove the DRBR and associated logic in the transmit path.
>    
>    The hardware queues are deep enough currently and using the DRBR and associated
>    callbacks only leads to more task switching in the TX path. The is also a race
>    setting the queue_state which can lead to hung TX rings.
>    

The point of DRBR in the tx path is not simply to provide a software 
ring for queuing excess packets.  Rather it provides a mechanism to
avoid lock contention by shoving a packet into the software ring, where
it will later be found & processed, rather than blocking the caller on
a mtx lock.   I'm concerned you may have introduced a performance
regression for use cases where you have N:1  or N:M lock contention 
where many threads on different cores are contending for the same tx 
queue.  The state of the art for this is no longer DRBR, but mp_ring,
as used by both cxgbe and iflib.

For well behaved workloads (like Netflix's), I don't anticipate
this being a performance issue.  However, I worry that this will impact
other workloads and that you should consider running some testing of
N:1 contention.   Eg, 128 netperfs running in parallel with only
a few nic tx rings.

Sorry for the late reply.. I'm behind on my -committers email.  If you
have not already MFC'ed this, you may want to reconsider.

Drew