[Bug 283903] rtw88: possible skb leak

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 29 Jan 2025 10:10:21 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283903

--- Comment #15 from Guillaume Outters <guillaume-freebsd@outters.eu> ---
(regarding my question on how to start playing with the kernel and if linux_kpi
was externalizable as a module, I resolved to learn how to compile and start a
full custom kernel, which is a bit tedious but works)

I could notice the fully binary behaviour of alloc, by tracing in linux_80211.c
at:
- lkpi_80211_txq_tx_one(), after the dev_alloc_skb() of
https://github.com/freebsd/freebsd-src/blob/main/sys/compat/linuxkpi/common/src/linux_80211.c#L3816
- linuxkpi_ieee80211_free_txskb(), before the _lkpi_ieee80211_free_txskb()

Test procedure:
reboot around 14:40 with the traced kernel, run two loops of:
while true ; do scp ... ; vmstat -m | grep skb ; sleep 20 ; done
One scp lasts 14 s, the other one 10 s, to create interferences (so that from
time to time they run simultaneously).
I added two web browsing sessions, one around 15:00, the other around 17:40.

MINUTES AFTER THE SECOND WEB BROWSING SESSION (interleaved with the scp's),
VMSTAT STARTED REPORTING INCREASES WITHOUT FREEING ANYMORE.
The increase speed matches that of the consumed bandwidth (1 MB of download ->
1 MB of leaking SK buff mem).

Here is a typical pattern BEFORE the tipping point, each SKB gets freed
rapidly:

Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c501e000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 206 ->
0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c501e000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c5173000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c52ae000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c5173000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c52ae000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000

Here is the pattern AT the tipping point:

Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c51ed000
Jan 28 17:43:22 pasdfric kernel: ERROR Guillaume: free_skb 0xfffffe008fdc2000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c52ae000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c5279000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c52fc000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c4fd3000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c4fca000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c525e000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c53ff000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c4d26000
Jan 28 17:43:22 pasdfric kernel: rtw880: ERROR Guillaume: alloc_skb 170 ->
0xfffffe00c5341000

Now I'll just have to go up in linuxkpi_ieee80211_free_txskb()'s callers to see
what calls it habitually, and doesn't call it anymore after the tipping
point...

-- 
You are receiving this mail because:
You are on the CC list for the bug.