Re: git: 151abc80cde7 - main - if_vlan: avoid hash table thrashing when adding and removing entries

From: John Baldwin <jhb_at_FreeBSD.org>
Date: Mon, 25 Jul 2022 16:20:52 UTC
On 7/22/22 4:19 PM, Kristof Provost wrote:
> The branch main has been updated by kp:
> 
> URL: https://cgit.FreeBSD.org/src/commit/?id=151abc80cde778bc18b91c334d07fbd52bbb38fb
> 
> commit 151abc80cde778bc18b91c334d07fbd52bbb38fb
> Author:     Kristof Provost <kp@FreeBSD.org>
> AuthorDate: 2022-07-22 17:17:04 +0000
> Commit:     Kristof Provost <kp@FreeBSD.org>
> CommitDate: 2022-07-22 17:18:41 +0000
> 
>      if_vlan: avoid hash table thrashing when adding and removing entries
>      
>      vlan_remhash() uses incorrect value for b.
>      
>      When using the default value for VLAN_DEF_HWIDTH (4), the VLAN hash-list table
>      expands from 16 chains to 32 chains as the 129th entry is added. trunk->hwidth
>      becomes 5. Say a few more entries are added and there are now 135 entries.
>      trunk-hwidth will still be 5. If an entry is removed, vlan_remhash() will
>      calculate a value of 32 for b. refcnt will be decremented to 134. The if
>      comparison at line 473 will return true and vlan_growhash() will be called. The
>      VLAN hash-list table will be compressed from 32 chains wide to 16 chains wide.
>      hwidth will become 4. This is an error, and it can be seen when a new VLAN is
>      added. The table will again be expanded. If an entry is then removed, again
>      the table is contracted.
>      
>      If the number of VLANS stays in the range of 128-512, each time an insert
>      follows a remove, the table will expand. Each time a remove follows an
>      insert, the table will be contracted.
>      
>      The fix is simple. The line 473 should test that the number of entries has
>      decreased such that the table should be contracted using what would be the new
>      value of hwidth. line 467 should be:
>      
>              b = 1 << (trunk->hwidth - 1);
>      
>      PR:             265382
>      Reviewed by:    kp
>      MFC after:      2 weeks
>      Sponsored by:   NetApp, Inc.

This does mean that there are some odd edge cases (e.g. if you remove the 513th entry
only to turn around and re-add it) that can still thrash even with this fixed.  Not
sure if those are worth worrying about though?  If they were, perhaps you could
imagine some sort of "dampener" where you have to remove some number N of entries
to actually rehash to a smaller table.  But that is probably rather complex to
implement and probably not worth it?

-- 
John Baldwin