Lock order reversals with dummynet (Re: FreeBSD 7.0 Beta, RC,
RELEASE (amd64) freezes with dummynet enabled)
Kris Kennaway
kris at FreeBSD.org
Fri Feb 29 10:37:30 UTC 2008
Adding back the mailing list so others can help.
matthew wrote:
> Kris Kennaway wrote:
>
>> matthew wrote:
>>> I have posted before that i have a stability issue with the 7.0 branch
>>> on my servers. Tested on BETA2,BETA4,RC1,RC2,RELEASE
>>>
>>> The original thread and my post with details is at:
>>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-12/msg00674.html
>>>
>>>
>>> I was waiting for the 7.0-RELEASE, updated the whole servers, and
>>> enabled dummynet again, but it always freezes after some minutes, 100%
>>> reproducible.
>>>
>>> I tested it also on a HP 140 G3 1U server, where 6.3 has absolutely no
>>> problems, but the 7.0 branch keeps freezing.
>>>
>>> Again, if it helps to solve this bug, i can rebuild the kernel with
>>> debug symbols a take some screenshots :)
>>
>> Please follow the steps at
>>
>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
>>
>>
>> Kris
>> _______________________________________________
> I added to the kernel these options:
> makeoptions DEBUG=-g # Build kernel with gdb(1) debug
> symbols
> options INVARIANTS
> options INVARIANT_SUPPORT
> options WITNESS
> options DEBUG_LOCKS
> options DEBUG_VFS_LOCKS
> options DIAGNOSTIC
> options KDB
> options DDB
> options GDB
> options SOCKBUF_DEBUG
>
> But still, the server freezes withoutany debug message,
And you can't break to DDB? It is expected you will have to when you
encounter a deadlock.
> panic whatever,
> but at leasts it dumps the crash kernel to /var/crash, but the debug
> looks like this:
Dumping is an activity the occurs after panicking, so if you have a dump
then you must have a panic.
> root at hanka:/usr/src/sys/amd64/compile/HANKA-debug# kgdb kernel.debug
> /var/crash/vmcore.0
> [GDB will not be able to debug user-mode threads:
> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for details.
> This GDB was configured as "amd64-marcel-freebsd".
> Cannot access memory at address 0x57
> (kgdb) backtrace
> #0 0x0000000000000000 in ?? ()
> Cannot access memory at address 0x0
> (kgdb)
> I have some debug messages at boot time:
This usually means you are not running the dump against the kernel.debug
corresponding exactly to the kernel that crashed. Probably the dump you
found was old.
> Feb 28 17:01:29 hanka lock order reversal:
> Feb 28 17:01:29 hanka 1st 0xffffffff80b0e1e8 PFil hook read/write mutex
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @
> /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:2970
> Feb 28 17:01:29 hanka KDB: stack backtrace:
> Feb 28 17:01:29 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
> Feb 28 17:01:29 hanka pf_socket_lookup() at pf_socket_lookup+0x241
> Feb 28 17:01:29 hanka pf_test_udp() at pf_test_udp+0x890
> Feb 28 17:01:29 hanka pf_test() at pf_test+0xeb2
> Feb 28 17:01:29 hanka pf_check_in() at pf_check_in+0x2b
> Feb 28 17:01:29 hanka pfil_run_hooks() at pfil_run_hooks+0xbc
> Feb 28 17:01:29 hanka ip_input() at ip_input+0x2c4
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp
> = 0 ---
> Feb 28 17:01:29 hanka lock order reversal:
> Feb 28 17:01:29 hanka 1st 0xffffff0001237430 em0 (EM Core Lock) @
> dev/em/if_em.c:1416
> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @
> netinet/udp_usrreq.c:385
> Feb 28 17:01:29 hanka KDB: stack backtrace:
> Feb 28 17:01:29 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
> Feb 28 17:01:29 hanka udp_input() at udp_input+0x1a4
> Feb 28 17:01:29 hanka ip_input() at ip_input+0xc0
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp
> = 0 ---
> Feb 28 17:01:29 hanka xinetd[1356]: xinetd Version 2.3.14 started with
> libwrap loadavg options compiled in.
> Feb 28 17:01:29 hanka xinetd[1356]: Started working: 1 available service
> Feb 28 17:01:33 hanka lock order reversal:
> Feb 28 17:01:33 hanka 1st 0xffffff00042f5b10 inp (tcpinp) @
> netinet/tcp_usrreq.c:781
> Feb 28 17:01:33 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:01:33 hanka KDB: stack backtrace:
> Feb 28 17:01:33 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:01:33 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:01:33 hanka _rw_rlock() at _rw_rlock+0x5b
> Feb 28 17:01:33 hanka pfil_run_hooks() at pfil_run_hooks+0x44
> Feb 28 17:01:33 hanka ip_output() at ip_output+0x38e
> Feb 28 17:01:33 hanka tcp_output() at tcp_output+0xacd
> Feb 28 17:01:33 hanka tcp_usr_send() at tcp_usr_send+0x272
> Feb 28 17:01:33 hanka sosend_generic() at sosend_generic+0x347
> Feb 28 17:01:33 hanka soo_write() at soo_write+0x38
> Feb 28 17:01:33 hanka dofilewrite() at dofilewrite+0x85
> Feb 28 17:01:33 hanka kern_writev() at
> Feb 28 17:01:33 hanka kern_writev+0x4c
> Feb 28 17:01:33 hanka write() at write+0x54
> Feb 28 17:01:33 hanka syscall() at syscall+0x1f6
> Feb 28 17:01:33 hanka Xfast_syscall() at Xfast_syscall+0xab
> Feb 28 17:01:33 hanka --- syscall (4, FreeBSD ELF64, write), rip =
> 0x800e1fd3c, rsp = 0x7fffffffea88, rbp = 0x16 ---
> Feb 28 17:03:54 hanka lock order reversal:
> Feb 28 17:03:54 hanka 1st 0xffffffff80b0f308 tcp (tcp) @
> netinet/tcp_input.c:400
> Feb 28 17:03:54 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:03:54 hanka KDB: stack backtrace:
> Feb 28 17:03:54 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:03:54 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:03:54 hanka _rw_rlock() at _rw_rlock+0x5b
> Feb 28 17:03:54 hanka pfil_run_hooks() at pfil_run_hooks+0x44
> Feb 28 17:03:54 hanka ip_output() at ip_output+0x38e
> Feb 28 17:03:54 hanka tcp_respond() at tcp_respond+0x2d5
> Feb 28 17:03:54 hanka tcp_dropwithreset() at tcp_dropwithreset+0x131
> Feb 28 17:03:54 hanka tcp_input() at tcp_input+0x6d0
> Feb 28 17:03:54 hanka ip_input() at ip_input+0xc0
> Feb 28 17:03:54 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:03:54 hanka ether_input() at ether_input+0x19d
> Feb 28 17:03:54 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:03:54 hanka em_poll() at em_poll+0x6f
> Feb 28 17:03:54 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:03:54 hanka swi_net() at swi_net+0xea
> Feb 28 17:03:54 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:03:54 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:03:54 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:03:54 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp
> = 0 ---
> Feb 28 17:23:36 hanka lock order reversal:
> Feb 28 17:23:36 hanka 1st 0xffffffff810f9ef0 tcp_sc_head (tcp_sc_head) @
> netinet/tcp_syncache.c:477
> Feb 28 17:23:36 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write mutex
> (PFil hook read/write mutex) @ net/pfil.c:73
> Feb 28 17:23:36 hanka KDB: stack backtrace:
> Feb 28 17:23:36 hanka db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
> Feb 28 17:23:36 hanka witness_checkorder() at witness_checkorder+0x605
> Feb 28 17:23:36 hanka _rw_rlock() at _rw_rlock+0x5b
> Feb 28 17:23:36 hanka pfil_run_hooks() at pfil_run_hooks+0x44
> Feb 28 17:23:36 hanka ip_output() at ip_output+0x38e
> Feb 28 17:23:36 hanka syncache_respond() at syncache_respond+0x32a
> Feb 28 17:23:36 hanka syncache_add() at syncache_add+0x206
> Feb 28 17:23:36 hanka tcp_input() at tcp_input+0xa77
> Feb 28 17:23:36 hanka ip_input() at ip_input+0xc0
> Feb 28 17:23:36 hanka ether_demux() at ether_demux+0x1d9
> Feb 28 17:23:36 hanka ether_input() at ether_input+0x19d
> Feb 28 17:23:36 hanka em_rxeof() at em_rxeof+0x1ca
> Feb 28 17:23:36 hanka em_poll() at em_poll+0x6f
> Feb 28 17:23:36 hanka netisr_poll() at netisr_poll+0x87
> Feb 28 17:23:36 hanka swi_net() at swi_net+0xea
> Feb 28 17:23:36 hanka ithread_loop() at ithread_loop+0xda
> Feb 28 17:23:36 hanka fork_exit() at fork_exit+0x12a
> Feb 28 17:23:36 hanka fork_trampoline() at fork_trampoline+0xe
> Feb 28 17:23:36 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30, rbp
> = 0 ---
These look useful and should point to the problem. It is also probably
related to something in your script:
> I tried it with pf disabled, only running ipfw without dummynet,
> everything is ok.
> I tried it witout pf disabled, running ipfw with dummynet queue for
> small traffic, everything is ok.
> I tried it with/without pf disabled, running ipfw with dummynet queue
> for hight traffic >100Mbit/s, the box freezes, everytime, after enabling
> rules with the dummynet queue, after some second or few minutes, maybe
> depending on the high traffic.
>
> The shaper script:
> # SHAPER
> IPFW_PIPE_DOWNLOAD=1
> IPFW_PIPE_UPLOAD=11
> IPFW_QUEUE_DOWNLOAD=1
> IPFW_QUEUE_UPLOAD=11
>
> DOWNLOAD_ROOT="450Mbit/s"
> UPLOAD_ROOT="450Mbit/s"
> SHAPER_BUCKETS="1024"
>
> IPFW_SHAPER_ROOT=1000
>
> $CMD pipe $IPFW_PIPE_DOWNLOAD config bw $DOWNLOAD_ROOT buckets
> $SHAPER_BUCKETS
> $CMD pipe $IPFW_PIPE_UPLOAD config bw $UPLOAD_ROOT buckets $SHAPER_BUCKETS
>
> $CMD queue $IPFW_QUEUE_DOWNLOAD config pipe $IPFW_PIPE_DOWNLOAD buckets
> $SHAPER_BUCKETS mask dst-ip 0xFFFFFFFF
> $CMD queue $IPFW_QUEUE_UPLOAD config pipe $IPFW_PIPE_UPLOAD buckets
> $SHAPER_BUCKETS mask src-ip 0xFFFFFFFF
>
> $CMD delete $IPFW_SHAPER_ROOT
> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_DOWNLOAD ip from me to any
> out // Share download
> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_UPLOAD ip from any to me in
> // Share upload
Thanks for the information.
Kris
More information about the freebsd-net
mailing list