From nobody Fri Jan 31 22:54:32 2025 X-Original-To: wireless@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4YlB6K1z18z5lfXk for ; Fri, 31 Jan 2025 22:54:33 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4YlB6J6M1dz3pkR for ; Fri, 31 Jan 2025 22:54:32 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1738364072; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g0WGZBGsye9a9G65bJgRFT/EzAK5HFpKMzFIRODtQ8o=; b=a+7+/K7wuVl6X3HCWoGCLuH0lkwP80H32bx1dM0MWzCQPGXc7RUEHWPuygc+ZycMwvwFE0 mOne1heTsgVoW6pbYDjuv2Wxu76Bnv2BpbKtlegW8QEj/qx3xDdH4befcc5YndzyF8CISM 78/vd2X/6UCz7jdu6muZ9dn1UYpOuNAtxGXhpmWG/auRLhsKeCY0xVq6/gXliXFIXY0hzD TVadxK2QfOoaW78L6m89u+zFD7DhjgKlI5a23u50W7H6vNzjg3bBZXS+bis4W0+xla1er4 M8dXp4ysnlo9afFpfuDLKMTaCnDrqPF05Bt6w3h1mud59h1BdJ+iG9iMi3M+SQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1738364072; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=g0WGZBGsye9a9G65bJgRFT/EzAK5HFpKMzFIRODtQ8o=; b=vipAhCi5noqzv7BFz31pFIHJ40CQJ4uNJkMRN1KoHIJgEPVMEcFvCZ4WeHRHcr+ky8CXTu W2m9VpYY3Wako2X/3EWXiGaTN959HIfFISq/cHpuCKZk9TZY8gutSjY6zDbxKpG1XyVwo6 zebJhTmc1nrQfeutQmVf5XjXhEAT4fpuueTQADxjXq85hA6HkbaiZ9P5k0iTdcO5NM1bn5 DPOug35SNCOAJIsNs8cVWrH42tWq03onUTDRihktHQpTVBVqbWHgmPdkRuej42jEvrLLfy K2zyU2C0vak+JJ9Y4AclfxeQV9gDer2j8IJNpFrQsBh5+te2EEn6tmXtDZ99Qw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1738364072; a=rsa-sha256; cv=none; b=a1SFslctE25yFl9+/dnYg5r/tGtdpFY7guco6Wm2eX91PVU6HTcj6PlsN7fhgDCFjJfyql RPHZ52vWH5TFwLjMthTGQFQM1BU3q+BKxFhUnDZP9KYNoI/bGI5ekCHuejW+jSPLEOc+xR ls28PSfPepNOLlnKEitRcmuEuwTP0RaWd9UVc2KOP/WoTLX95FAFf5yGAsI6dDKPe5MPt5 qUSyNseqfN7qqiAj8MP09WNjPZVU2qLacje64ucn+3tzmTAbFc0GRADMDKhmAqTe6Jz34f 1L+t56haEaNpAmDpO5chXQarDprdsPeVbWnqQI7pIqYsHtP8ZWELiRmSGtWGKg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2610:1c1:1:606c::50:1d]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4YlB6J5fyJzf8w for ; Fri, 31 Jan 2025 22:54:32 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org ([127.0.1.5]) by kenobi.freebsd.org (8.15.2/8.15.2) with ESMTP id 50VMsWdP081531 for ; Fri, 31 Jan 2025 22:54:32 GMT (envelope-from bugzilla-noreply@freebsd.org) Received: (from www@localhost) by kenobi.freebsd.org (8.15.2/8.15.2/Submit) id 50VMsWUr081529 for wireless@FreeBSD.org; Fri, 31 Jan 2025 22:54:32 GMT (envelope-from bugzilla-noreply@freebsd.org) X-Authentication-Warning: kenobi.freebsd.org: www set sender to bugzilla-noreply@freebsd.org using -f From: bugzilla-noreply@freebsd.org To: wireless@FreeBSD.org Subject: [Bug 283903] rtw88: possible skb leak Date: Fri, 31 Jan 2025 22:54:32 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: wireless X-Bugzilla-Version: CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: guillaume-freebsd@outters.eu X-Bugzilla-Status: Open X-Bugzilla-Resolution: X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: bz@FreeBSD.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated List-Id: Discussions List-Archive: https://lists.freebsd.org/archives/freebsd-wireless List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-wireless@freebsd.org Sender: owner-freebsd-wireless@FreeBSD.org MIME-Version: 1.0 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D283903 --- Comment #21 from Guillaume Outters --- (In reply to Bjoern A. Zeeb from comment #16, comment #17, comment #19) > This may or may not be the same problem (and I do have four different Rea= ltek cards in this system) but another possible data point: > [...] > It started like (but no SKB alloc failures in the log): > rtw881: failed to dequeue 3967 skb TX queue 5, BD=3D0xffffffff, rp 127 ->= 4095^M "skb alloc failed" is not the primary symptom, rather it's a consequence on= ce the situation has degraded too much. For me the first symptom is `vmstat -m | grep skb` growing at the pace of network activity; sometimes as soon as 10 mn after the reboot (but sometimes far later than that; I still don't understand what triggers the change). At boot I always have a disturbingly flat value of 16781312 (16 MB + 1 buff= er of 4096 b); some activity will make it eat 1 or 2 or more buffers, but it always ends up returning to this value. Then as long as I see it going over 18 MB, I know that something has gone rogue, and that _any new activity will make unreleased skbuffs grow proportionally to network traffic_. In my 5 lasts attempts, for a 10 MiB nc transfer (either from or to my lapt= op), it had grown 19.43 MiB, then 18.60, 18.52, 14.52, 15.04, after having temporarily topped at 23.20 MiB, 45.36, 42.29, 21.64, 26.52 (from a baseline which was the vmstat -m before the transfer, at around 2.2 GiB; that is, before first transfer vmstat was at 2.2 GiB, during the transfer it peaked =C3=A0 2.22320i GiB, and after the transfer it went back to "only" 2= .21943 GiB) During that phase, I noticed some sluggishness or small Wayland freezes (3 = to dozens of seconds) from time to time; as if (but that's just a guess) it di= d a lookup over all the already allocated mem (to look for the oldest buffer to free? For a free mem block where to allocate a new buffer? To relocate buff= ers? To reach the end of a linked-list pool of buffers, looking for one to reuse= ?). The "skb alloc failed" only occurs ONCE IT CANNOT ALLOCATE ANYMORE IN THE F= IRST 4 GB OF RAM (due to compat.linuxkpi.skb.mem_limit=3D1, as I understand), af= ter having grown MB by MB (and being in concurrence with userspace processes: t= his afternoon after some "skb alloc failed", I quit a long-running Firefox, and= as a result got 1 or 2 hours without "skb alloc failed"). > Did you also instrument the RX path? > Are your SCPs pushing or pulling data? as in do you copy a file off from= the rtw88 device or do you copy a file to the rtw88 device? > But also 170 bytes is really not much each time. I just instrumented those functions, because that file was the one I could = grep 'skb_free|free_skb' in, and I didn't look further. And my original tests were on pushes (via scp). But today with my 10 MiB of a .gz file pushed and pulled via nc, I could measure the same non-freed allocations: ALL of the ~3400 allocations traced for one transfer were between 126 and 196 bytes long (but it's not an absolute limit: another test later saw 4 out of 4000 at 243, 249, 368 and 4= 34 bytes). However, the `vmstat -m | grep skb` increase was way more more than the sum= of all those traced packets; for a 10 MiB transfer that resulted in a 15 - 20 = MiB increase of skb allocated space, only 650 KB were traced by my artisanal pr= obe. On the other hand, I don't know how the allocator works: IF FOR EACH 170 b ALLOCATION REQUESTED, THE ALLOCATOR RETURNS A FULL 4096 B PAGE, THEN THIS EXPLAINS OUR 15 - 20 MB INCREASE (I saw from 3500 to 7000 calls through my trace, which multiplied by 4 K give 14.3 to 28.7 MiB: it would perfectly ma= tch with vmstat report!). > BTW. you do not have to patch the kernel for this. Dtrace provides adequ= ate tracing functionality in this case. > Here's a sample I shared earlier on which you can probably use as a start: > [...] Nice! It's been a long time I say to myself that I should look into Dtrace,= but I never did; you're adding to the good reasons to do that in 2025. > Coming back after a while I see on the 1 minute update differences for vm= stat -m for lkpiskb (but not mbuf-tags): > # It's exactly one page a time! > % expr 74 \* 4096 > 303104 From my experience (see my first block of this reply), during the "non problematic phase", a continuous use of network (a transfer) made skb alloc= ated by multiples of 4 KB, but as soon as network's pressure lowered, they were released and I reached back to 16 MB + 4 KB. So this may be normal to have a small, permanent increase... as long as the dequeue has opportunities to run, and, more important, as long as it does n= ot just resigns.=20 Now I'll have to reboot to post this long comment: any HTTP request now get= s at least an "skb alloc failed", network isn't usable anymore. --=20 You are receiving this mail because: You are on the CC list for the bug.=