kern/118093: firewire bus reset hogs CPU,
causing data to be lost
dieterbsd at engineer.com
dieterbsd at engineer.com
Thu Feb 3 22:00:24 UTC 2011
The following reply was made to PR kern/118093; it has been noted by GNATS.
From: dieterbsd at engineer.com
To: bug-followup at FreeBSD.org
Cc: freebsd-firewire at freebsd.org, freebsd-usb at freebsd.org,
freebsd-drivers at freebsd.org, freebsd-net at freebsd.org,
freebsd at sopwith.solgatos.com
Subject: Re: kern/118093: firewire bus reset hogs CPU, causing data to be lost
Date: Thu, 03 Feb 2011 16:53:32 -0500
New data: USB causes the same problem that firewire does.
FreeBSD 8.0
amd64
nfe0: <NVIDIA nForce4 CK804 MCP9 Networking Adapter> port 0xb400-0xb407=20
mem 0xfebf9000-0xfebf9fff irq 23 at device 10.0 on pci0
bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev.=20
0x4101> mem 0xfe4f0000-0xfe4fffff irq 19 at device 0.0 on pci5
fwohci1: <NEC uPD72871/2> mem 0xfdeff000-0xfdefffff irq 19 at device=20
8.0 on pci2
ehci0: <NVIDIA nForce4 USB 2.0 controller> mem 0xfebfe000-0xfebfe0ff=20
irq 22 at device 2.1 on pci0
Plugging in a USB_to_RS-232 bridge generates:
ugen0.2: <Prolific Technology Inc.> at usbus0
uplcom0: <Prolific Technology Inc. USB-Serial Controller D, class 0/0,=20
rev 1.10/
4.00, addr 2> on usbus0
And networking is locked out for too long, and data is lost. (The node=20
sending data
via Ethernet has too small a transmit buffer, but is a closed source=20
closed hardware
black box and cannot be fixed. Once data is lost it is lost forever,=20
it cannot be
recreated, so this is very bad.) The FreeBSD box receiving data has a=20
very large
receive buffer, but if another device driver locks out networking it=20
doesn't help.
Changing the printf(9)s to log(9)s fixes the problem, but this is not a=20
good workaround,
there are LOTS of printfs in the kernel, I keep hitting new ones. =20
Printf(9) alone isn't
the problem, adding printfs to chown(2) does not cause the problem, but=20
printfs from
device drivers do. (usb, firewire, ...)
My theory is there is a locking conflict.
Here is lock profiling data for USB_RS-232 bridge being plugged in=20
printfs locking out networking:
debug.lock.prof.enable: 0
debug.lock.prof.reset: 0
debug.lock.prof.stats:
max wait_max total wait_total count avg wait_avg=20
cnt_hold cnt_lock name
251240 0 499982 0 303 1650 0 0=
=20
0 /usr/src/sys/dev/usb/usb_request.c:322 (sx:0123456789ABCDEF -=20
USB device SX lock)
104255 0 110429 0 2 55214 0 0=
=20
0 /usr/src/sys/dev/usb/usb_device.c:2459 (sx:123456789ABCDEF - USB=
=20
config SX lock)
104217 0 29670949 0 47087 630 0 0=
=20
0 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) =20
<---------<<<<
101962 0 103089 0 27 3818 0 0=
=20
0 /usr/src/sys/kern/kern_condvar.c:145 (sleep mutex:Giant)
51071 0 81088 0 3 27029 0 0=
=20
0 /usr/src/sys/kern/kern_cons.c:422 (spin mutex:cnputs_mtx)
23049 0 31351 0 2 15675 0 0=
=20
0 /usr/src/sys/kern/vfs_syscalls.c:3508 (lockmgr:ufs)
12408 0 978544 0 971 1007 0 0=
=20
0 /usr/src/sys/kern/vfs_vnops.c:604 (lockmgr:ufs)
11466 0 744372 0 11567 64 0 0=
=20
0 /usr/src/sys/kern/vfs_bio.c:2559 (lockmgr:bufwait)
11141 0 501155 0 3850 130 0 0=
=20
0 /usr/src/sys/kern/vfs_bio.c:1835 (lockmgr:bufwait)
6115 0 61720 0 26 2373 0 0=
=20
0 /usr/src/sys/dev/usb/usb_request.c:322 (sx:123456789ABCDEF - USB=
=20
device SX lock)
5551 0 16749 0 19 881 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:syncer)
3043 0 3749 0 6 624 0 0=
=20
0 /usr/src/sys/ufs/ffs/ffs_vfsops.c:1321 (sleep mutex:struct mount=
=20
mtx)
2333 0 9901 0 36 275 0 0=
=20
0 /usr/src/sys/kern/vfs_mount.c:2231 (sleep mutex:struct mount mtx)
2055 0 9462 0 100 94 0 0=
=20
0 /usr/src/sys/kern/subr_hints.c:117 (sleep mutex:kernel=20
environment)
1649 0 79272 0 3846 20 0 0=
=20
0 /usr/src/sys/kern/vfs_bio.c:2963 (sleep mutex:vm object)
1636 0 36196 0 15515 2 0 0=
=20
0 /usr/src/sys/vm/vm_page.c:1052 (sleep mutex:vm page queue free=20
mutex)
1205 0 55601 0 15411 3 0 0=
=20
0 /usr/src/sys/kern/vfs_bio.c:2545 (sleep mutex:bufobj interlock)
1076 0 1076 0 1 1076 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:ufs)
1069 0 31117 0 30 1037 0 0=
=20
0 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:nfe0)
848 0 3752 0 1359 2 0 0=
=20
0 /usr/src/sys/ufs/ffs/ffs_vfsops.c:1297 (sleep mutex:vnode=20
interlock)
754 0 10016 0 3846 2 0 0=
=20
0 /usr/src/sys/ufs/ffs/ffs_softdep.c:4274 (sleep mutex:Softdep=20
Lock)
739 0 2463 0 4 615 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:devfs)
540 0 3370 0 1365 2 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:3179 (sleep mutex:vnode interlock)
525 0 80852 0 160 505 0 0=
=20
0 /usr/src/sys/dev/uart/uart_cpu.h:92 (spin mutex:uart_hwmtx)
512 0 2065 0 23 89 0 0=
=20
0 /usr/src/sys/kern/vfs_default.c:599 (lockmgr:bufwait)
487 0 97058 0 481 201 0 0=
=20
0 /usr/src/sys/vm/vm_pager.c:311 (lockmgr:bufwait)
467 0 2069 0 12 172 0 0=
=20
0 /usr/src/sys/dev/usb/usb_busdma.c:547 (sleep mutex:uplcom)
455 0 2978 0 18 165 0 0=
=20
0 /usr/src/sys/dev/usb/usb_transfer.c:286 (sleep mutex:uplcom)
423 0 727 0 2 363 0 0=
=20
0 /usr/src/sys/vm/uma_core.c:1565 (sleep mutex:UMA lock)
367 0 230166 0 47087 4 0 0=
=20
0 /usr/src/sys/netinet/tcp_usrreq.c:708 (rw:tcpinp)
358 0 1387 0 11 126 0 0=
=20
0 /usr/src/sys/kern/kern_synch.c:241 (sleep mutex:Giant)
339 0 72085 0 46119 1 0 0=
=20
0 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:so_rcv)
332 0 225434 0 93206 2 0 0=
=20
0 /usr/src/sys/kern/uipc_socket.c:1441 (sleep mutex:so_rcv)
332 0 19529 0 11551 1 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:1927 (sleep mutex:bufobj interlock)
331 0 162523 0 94887 1 0 0=
=20
0 /usr/src/sys/netinet/tcp_output.c:280 (sleep mutex:so_snd)
330 0 73512 0 47087 1 0 0=
=20
0 /usr/src/sys/kern/uipc_socket.c:1843 (sleep mutex:so_rcv)
329 0 88069 0 47771 1 0 0=
=20
0 /usr/src/sys/kern/uipc_socket.c:1687 (sleep mutex:so_rcv)
329 0 348 0 12 29 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:1955 (sleep mutex:Syncer mtx)
320 0 66185 0 3353 19 0 0=
=20
0 /usr/src/sys/kern/vfs_cluster.c:868 (lockmgr:bufwait)
303 0 7410 0 4360 1 0 0=
=20
0 /usr/src/sys/kern/vfs_bio.c:3944 (sleep mutex:bufobj interlock)
debug.lock.prof.rejected: 0
debug.lock.prof.skipcount: 0
debug.lock.prof.skipspin: 0
Here is lock profiling data for firewire bus reset printfs locking out=20
Ethernet:
max wait_max total wait_total count avg wait_avg=20
cnt_hold cnt_lock name
373345 0 397495 0 643 618 0 0=
=20
0 /usr/src/sys/kern/vfs_lookup.c:497 (lockmgr:ufs)
373325 0 390815 0 3140 124 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:2218 (sleep mutex:vnode interlock)
359514 0 32199808 0 45582 706 0 0=
=20
0 /usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx) =20
<---------<<<<
63827 320 746196 740 70067 10 0 0=
=20
4 /usr/src/sys/dev/firewire/fwohci.c:2107 (sleep mutex:firewire)
49927 0 165929 0 8 20741 0 0=
=20
0 /usr/src/sys/kern/vfs_syscalls.c:3508 (lockmgr:ufs)
33358 0 1707461 0 1106 1543 0 0=
=20
0 /usr/src/sys/kern/vfs_vnops.c:604 (lockmgr:ufs)
30294 0 401224 0 2641 151 0 0=
=20
0 /usr/src/sys/vm/vm_map.c:3526 (sx:user map)
28432 0 934681 0 74 12630 0 0=
=20
0 /usr/src/sys/kern/kern_cons.c:422 (spin mutex:cnputs_mtx)
28296 0 66892 0 2641 25 0 0=
=20
0 /usr/src/sys/vm/vm_fault.c:297 (sleep mutex:vm object)
25020 0 26667 0 45 592 0 0=
=20
0 /usr/src/sys/kern/kern_exit.c:825 (sx:proctree)
19027 0 1225496 0 11937 102 0 0=
=20
0 /usr/src/sys/kern/vfs_bio.c:2559 (lockmgr:bufwait)
18887 0 906297 0 4140 218 0 0=
=20
0 /usr/src/sys/kern/vfs_bio.c:1835 (lockmgr:bufwait)
12101 0 23459 0 1675 14 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:2342 (sleep mutex:vnode interlock)
7448 0 170841 0 2229 76 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:2083 (lockmgr:ufs)
7342 0 21792 0 2169 10 0 0=
=20
0 /usr/src/sys/kern/vfs_cache.c:390 (rw:Name Cache)
4648 0 10831 0 11 984 0 0=
=20
0 /usr/src/sys/kern/vfs_subr.c:1693 (lockmgr:syncer)
3536 0 35352 0 9218 3 0 0=
=20
0 /usr/src/sys/amd64/amd64/pmap.c:3989 (sleep mutex:pmap)
3039 0 34283 0 2641 12 0 0=
=20
0 /usr/src/sys/vm/vm_fault.c:937 (sleep mutex:vm object)
2825 0 53949 0 3495 15 0 0=
=20
0 /usr/src/sys/vm/vm_fault.c:1010 (sleep mutex:vm object)
2286 0 7169 0 547 13 0 0=
=20
0 /usr/src/sys/kern/kern_sig.c:983 (sleep mutex:process lock)
2083 4010 9037 5380 2143 4 2 0=
=20
2 /usr/src/sys/kern/vfs_subr.c:2118 (sleep mutex:vnode interlock)
2083 57 2089 57 4 522 14 0=
=20
1 /usr/src/sys/kern/kern_mutex.c:147 (sleep mutex:pipe mutex)
2022 0 181449 0 515 352 0 0=
=20
0 /usr/src/sys/vm/vm_pager.c:311 (lockmgr:bufwait)
1997 0 3397 0 2 1698 0 0=
=20
0 /usr/src/sys/kern/kern_sysctl.c:1513 (sx:sysctl mem)
1994 0 4890 0 117 41 0 0=
=20
0 /usr/src/sys/kern/kern_sysctl.c:1521 (sx:sysctl lock)
1971 0 4156 0 13 319 0 0=
=20
0 /usr/src/sys/kern/kern_sysctl.c:1417 (sleep mutex:Giant)
1965 0 31838 0 3471 9 0 0=
=20
0 /usr/src/sys/vm/vm_object.c:482 (sleep mutex:vm object)
1926 0 17695 0 83 213 0 0=
=20
0 /usr/src/sys/vm/vm_object.c:541 (sleep mutex:vm object)
There is at least one line in common that looks networking related,=20
/usr/src/sys/kern/uipc_sockbuf.c:148 (sx:so_rcv_sx)
int
sblock(struct sockbuf *sb, int flags)
{
KASSERT((flags & SBL_VALID) =3D=3D flags,
("sblock: flags invalid (0x%x)", flags));
if (flags & SBL_WAIT) {
if ((sb->sb_flags & SB_NOINTR) ||
(flags & SBL_NOINTR)) {
sx_xlock(&sb->sb_sx);
return (0);
}
return (sx_xlock_sig(&sb->sb_sx));
} else {
if (sx_try_xlock(&sb->sb_sx) =3D=3D 0)
return (EWOULDBLOCK);
return (0);
}
}
It is not clear to me what this code is doing. (comments might help)
It is not clear to me why networking and a USB_to_RS-232 bridge would
have a lock conflict. It is not clear to me why networking and firewire
would have a lock conflict.
How do we fix this?
More information about the freebsd-firewire
mailing list