[Bug 254333] [tcp] sysctl net.inet.tcp.hostcache.list hangs

Tue Mar 30 22:27:32 UTC 2021

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254333

--- Comment #19 from Richard Scheffenegger <rscheff at freebsd.org> ---
(In reply to Michael Tuexen from comment #18)
> Richard: Do you see a way how the counter could be off?

Unfortunately not. I was thinking about adding KASSERTS when the counter is
about to be decremented, but is already zero...

Or simply reset all the counters to zero (hch_length, cache_count) when a
purgeall is being performed, to return to a "known good" state.

The only feasible way I can thing of right now, would be if the hostcache
settings are changed dynamically, while the hostcache is already populated with
some entries. That may leave the hc_bucket full, but the counters off...

But all the normal processing of the accounting variables look good.

> But tcp_hc_purge_internal() decrements the counter when it removes an
> entry and frees it. I double checked the code and I think the counter
> is handled correctly. I did look for an underflow, but I could not
> find it...

> I also looked at the 11.4 code, but I see no issue.

The diff between 11.4 and HEAD is minuscle. None of the logic has changed
(meaning this very same issue could potentially still be impacting HEAD).

> If the hash buckets are used highly un-symmetric, I wouldn't suggest
> to use larger buckets. That results in long processing time. In that
> case I would suggest to use a better hash algorithm. But this is not
> the issue right now.

Correct. (A different salt during hashing may also help. Frequently used
entries do percholate to the head of the TAILQ, though.)

> For the counter having such a large value could happen when there is
> an underflow. But I don't see how it can happen. For me it looks like
> the global and the bucket counter are handled correctly. 

Still, a underflow is more likely than an overflow.

> Since the statement is that this happens every 3 to 4 month, it must
> be a rare event. Or some other code is writing in the memory location
> where the counter is...

KASSERTS for safety? (And an eventual core to potentially analyze)

-- 
You are receiving this mail because:
You are on the CC list for the bug.