[Bug 264549] e1000/bridge: "sleep on wchan ... with sleeping prohibited"
Date: Wed, 08 Jun 2022 16:46:59 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=264549 Bug ID: 264549 Summary: e1000/bridge: "sleep on wchan ... with sleeping prohibited" Product: Base System Version: 13.1-STABLE Hardware: Any OS: Any Status: New Severity: Affects Only Me Priority: --- Component: threads Assignee: threads@FreeBSD.org Reporter: bugaddress@posteo.net When employed as members of a bridge such as this one: bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 58:9c:fc:10:ff:d4 inet 192.168.219.1 netmask 0xffffff00 broadcast 192.168.219.255 id b1:c1:11:30:35:51 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id b1:c1:11:30:35:51 priority 32768 ifcost 0 port 0 member: igb1 flags=147<LEARNING,DISCOVER,STP,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 2 priority 128 path cost 2000000 proto rstp role disabled state discarding member: igb0 flags=147<LEARNING,DISCOVER,STP,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 2000000 proto rstp role disabled state discarding nd6 options=9<PERFORMNUD,IFDISABLED> groups: bridge network adapters igb0/igb1 trigger a "sleep on wchan ... with sleeping prohibited"-KASSERT whenever I pull any LAN cables. The stack backtrace reads: --------------------------------------------------- Unread portion of the kernel message buffer: [148] panic: sleepq_add: td 0xfffffe00380fd740 to sleep on wchan 0xffffffff8164ae00 with sleeping prohibited [148] cpuid = 0 [148] time = 1654687988 [148] __HardenedBSD_version = 1300062 __FreeBSD_version = 1301503 [148] version = FreeBSD 13.1-STABLE-HBSD #1 hardened/13-stable/master-n191189-be9f2a97c09: Wed Jun 8 11:14:09 UTC 2022 [148] (Redacted) [148] KDB: stack backtrace: [148] #0 0xffffffff80c7a3fb at kdb_backtrace+0x6b [148] #1 0xffffffff80c2f964 at vpanic+0x184 [148] #2 0xffffffff80c2f733 at panic+0x43 [148] #3 0xffffffff80c8a4b8 at sleepq_add+0x358 [148] #4 0xffffffff80c3bb44 at _sleep+0x194 [148] #5 0xffffffff80c3c160 at pause_sbt+0x110 [148] #6 0xffffffff82560efc at e1000_write_phy_reg_mdic+0xec [148] #7 0xffffffff825653b9 at e1000_read_phy_reg_gs40g+0x39 [148] #8 0xffffffff825626e0 at e1000_phy_has_link_generic+0x70 [148] #9 0xffffffff82569862 at e1000_check_for_copper_link_generic+0x22 [148] #10 0xffffffff8255884c at em_if_update_admin_status+0x18c [148] #11 0xffffffff80d685df at iflib_media_status+0x6f [148] #12 0xffffffff80d52624 at ifmedia_ioctl+0x174 [148] #13 0xffffffff83431189 at bstp_ifupdstatus+0x79 [148] #14 0xffffffff83431052 at bstp_linkstate+0x22 [148] #15 0xffffffff8342d494 at bridge_linkstate+0x84 [148] #16 0xffffffff80d4372c at do_link_state_change+0xcc [148] #17 0xffffffff80c8f990 at taskqueue_run_locked+0xb0 [148] Uptime: 2m28s [148] Dumping 1534 out of 32592 MB:..2%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:401 #2 0xffffffff80c2f574 in kern_reboot (howto=16644) at /usr/src/sys/kern/kern_shutdown.c:489 #3 0xffffffff80c2f9d3 in vpanic ( fmt=0xffffffff812bb4ad "%s: td %p to sleep on wchan %p with sleeping prohibited", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:925 #4 0xffffffff80c2f733 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:846 #5 0xffffffff80c8a4b8 in sleepq_add ( wchan=wchan@entry=0xffffffff8164ae00 <pause_wchan>, lock=lock@entry=0x0, wmesg=wmesg@entry=0xffffffff8258ec2c "e1000_delay", flags=2, flags@entry=<unavailable>, queue=queue@entry=0) at /usr/src/sys/kern/subr_sleepqueue.c:328 #6 0xffffffff80c3bb44 in _sleep (ident=0xffffffff8164ae00 <pause_wchan>, lock=lock@entry=0x0, priority=0, wmesg=<unavailable>, sbt=sbt@entry=4294967, pr=0, flags=256) at /usr/src/sys/kern/kern_synch.c:208 #7 0xffffffff80c3c160 in pause_sbt (wmesg=<unavailable>, sbt=4294967, pr=<unavailable>, pr@entry=0, flags=flags@entry=256) at /usr/src/sys/kern/kern_synch.c:340 #8 0xffffffff82560efc in safe_pause_us (x=50) at /usr/src/sys/dev/e1000/e1000_osdep.h:88 #9 e1000_write_phy_reg_mdic (hw=hw@entry=0xfffffe00dfdd2000, offset=offset@entry=22, data=<optimized out>) at /usr/src/sys/dev/e1000/e1000_phy.c:384 #10 0xffffffff825653b9 in e1000_read_phy_reg_gs40g (hw=0xfffffe00dfdd2000, offset=<optimized out>, data=0xfffffe00351e6b66) at /usr/src/sys/dev/e1000/e1000_phy.c:4103 #11 0xffffffff825626e0 in e1000_phy_has_link_generic (hw=0xfffffe00dfdd2000, iterations=1, usec_interval=0, success=0xfffffe00351e6ba7) at /usr/src/sys/dev/e1000/e1000_phy.c:2376 #12 0xffffffff82569862 in e1000_check_for_copper_link_generic ( hw=0xfffffe00dfdd2000) at /usr/src/sys/dev/e1000/e1000_mac.c:737 #13 0xffffffff8255884c in em_if_update_admin_status (ctx=0xfffff800020bd800) at /usr/src/sys/dev/e1000/if_em.c:1803 #14 0xffffffff80d685df in IFDI_UPDATE_ADMIN_STATUS (_ctx=0xfffff800020bd800) at ./ifdi_if.h:492 #15 iflib_media_status (ifp=<optimized out>, ifmr=0xfffffe00351e6c90) at /usr/src/sys/net/iflib.c:2557 #16 0xffffffff80d52624 in ifmedia_ioctl (ifp=<unavailable>, ifr=0xfffffe00351e6c90, ifm=0xfffff800020bdb18, cmd=<optimized out>) at /usr/src/sys/net/if_media.c:294 #17 0xffffffff83431189 in bstp_ifupdstatus (arg=0xfffff80023988018, pending=pending@entry=0) at /usr/src/sys/net/bridgestp.c:1821 #18 0xffffffff83431052 in bstp_linkstate (bp=bp@entry=0xfffff80023988018) at /usr/src/sys/net/bridgestp.c:1802 #19 0xffffffff8342d494 in bridge_linkstate (ifp=0xfffff800020d0000) at /usr/src/sys/net/if_bridge.c:3729 #20 0xffffffff80d4372c in do_link_state_change (arg=0xfffff800020d0000, pending=1) at /usr/src/sys/net/if.c:2317 #21 0xffffffff80c8f990 in taskqueue_run_locked ( queue=queue@entry=0xfffff80001ac8b00) at /usr/src/sys/kern/subr_taskqueue.c:477 #22 0xffffffff80c8f89d in taskqueue_run (queue=0xfffff80001ac8b00) at /usr/src/sys/kern/subr_taskqueue.c:492 #23 0xffffffff80bed9ee in intr_event_execute_handlers (ie=0xfffff80001ac8a00, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1167 #24 ithread_execute_handlers (ie=0xfffff80001ac8a00, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1180 #25 ithread_loop (arg=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1268 #26 0xffffffff80bea955 in fork_exit ( callout=0xffffffff80bed790 <ithread_loop>, arg=0xfffff80001af10e0, frame=0xfffffe00351e6f40) at /usr/src/sys/kern/kern_fork.c:1109 #27 <signal handler called> #28 mi_startup () at /usr/src/sys/kern/init_main.c:324 Backtrace stopped: Cannot access memory at address 0x2fd9008 (kgdb) --------------------------------------------------- I've solved this issue by replacing "pause" with "DELAY" inside the e1000 driver (see attachment). This certainly works for me, but could point to a deeper problem within anything bridge-related. (I couldn't reproduce this panic outside a bridge.) There's possibly a better solution in which we allow "pause" under some circumstances (unlike my blanket replacement), but I haven't found one. -- You are receiving this mail because: You are the assignee for the bug.