[Bug 281560] gve (4) uma deadlock during high tcp throughput

From: <bugzilla-noreply_at_freebsd.org>
Date: Wed, 02 Oct 2024 23:17:37 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=281560

--- Comment #18 from shailend@google.com ---
(In reply to Konstantin Belousov from comment #14)

Although I do not have access to the VMs to do `show pcpu`, I checked my notes
to find this `ps` entry:

100438                   Run     CPU 11                      [gve0 txq 4 xmit]

The packet transmitting thread is hogging the cpu and preventing iperf from
ever running to release the uma lock. The "gve0 txq 4 xmit" is running forever
because it is waiting on the tx cleanup thread to make room on the ring, and
that thread is not doing anything because it is waiting on the uma zone lock.  

I did another repro, and the situation is similar:

```
db> show lockchain 100416
thread 100416 (pid 0, gve0 rxq 0) is blocked on lock 0xfffffe00df57a3d0 (sleep
mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
db> show lockchain 100423
thread 100423 (pid 0, gve0 rxq 7) is blocked on lock 0xfffff8010447daa0 (rw)
"tcpinp"
thread 100736 (pid 860, iperf) is blocked on lock 0xfffffe00df57a3d0 (sleep
mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
db> show lockchain 100452
thread 100452 (pid 0, gve0 txq 10) is blocked on lock 0xfffffe00df57a3d0 (sleep
mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
```

Here 100708 is the offending iperf thread. Lets see its state:

```
db> show thread 100708
Thread 100708 at 0xfffff800a86bd000:
 proc (pid 860): 0xfffffe01a439bac0
 name: iperf
 pcb: 0xfffff800a86bd520
 stack: 0xfffffe01a4dc1000-0xfffffe01a4dc4fff
 flags: 0x5  pflags: 0x100
 state: RUNQ
 priority: 4
 container lock: sched lock 31 (0xfffffe001bee8440)
 last voluntary switch: 11510.470 s ago
 last involuntary switch: 11510.470 s ago
```

And now lets see whats happening on cpu 31:

```
db> show pcpu 31
cpuid        = 31
dynamic pcpu = 0xfffffe009a579d80
curthread    = 0xfffff800a8501740: pid 0 tid 100453 critnest 0 "gve0 txq 10
xmit"
curpcb       = 0xfffff800a8501c60
fpcurthread  = none
idlethread   = 0xfffff80003b04000: tid 100034 "idle: cpu31"
self         = 0xffffffff8242f000
curpmap      = 0xffffffff81b79c50
tssp         = 0xffffffff8242f384
rsp0         = 0xfffffe01a4ca8000
kcr3         = 0xffffffffffffffff
ucr3         = 0xffffffffffffffff
scr3         = 0x0
gs32p        = 0xffffffff8242f404
ldt          = 0xffffffff8242f444
tss          = 0xffffffff8242f434
curvnet      = 0
spin locks held:
```

Sure enough a driver transmit thread is hogging the cpu. And to seal the loop,
lets see what this queue's cleanup thread is doing:

```
db> show lockchain 100452
thread 100452 (pid 0, gve0 txq 10) is blocked on lock 0xfffffe00df57a3d0 (sleep
mutex) "mbuf"
thread 100708 (pid 860, iperf) is on a run queue
```

In summary this is the usual loop:

iperf thread (with uma zone lock) ---sched--->  gve tx xmit thread ---for
room---> gve tx cleanup thread -----uma zone lock----> iperf thread  

There is clearly a problematic behavior in the driver transmit thread
(gve_xmit_br): this taskqueue should not enqueue itself, and should rather let
the cleanup taskqueue wake it up when room is made in the ring, so I'll work on
that.  

But I also want to confirm that it is not problematic for an iperf thread to be
knocked off the cpu with the zone lock held: is it not a critical enough lock
to disallow that? (I am not very familiar with schedulers to know if this is a
naive question).

-- 
You are receiving this mail because:
You are the assignee for the bug.