Lock order reversals with dummynet (Re: FreeBSD 7.0 Beta, RC,
RELEASE (amd64) freezes with dummynet enabled)
matthew
matthew at matthew.sk
Fri Feb 29 11:56:18 UTC 2008
I have some more tests, but always the same.
Here is the top output, that i see last on terminal:
65 processes: 4 running, 144 sleeping, 17 waiting
CPU states: 0.0% user, 0.0% nice, 29.9% system, 7.8% interrupt, 62.3%
idle
Mem: 46M Active, 522M Inact, 350M Wired, 10M Cache, 109M Buf, 35M Free
Swap: 2048M Total, 228K Used, 2048M Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12 root 1 171 ki31 0K 16K RUN 0 692:10 61.96% idle: cpu0
11 root 1 171 ki31 0K 16K RUN 1 804:37 51.51% idle: cpu1
13 root 1 -44 - 0K 16K WAIT 0 325:41 13.18% swi1: net
42 root 1 -68 - 0K 16K - 0 12:26 12.74% dummynet
72368 ftp 1 4 0 10832K 2080K RUN 0 4:05 10.01% vsftpd
74488 ftp 1 4 0 10832K 2080K zfs:(& 0 0:31 9.62% vsftpd
73690 ftp 1 8 0 10832K 2072K nanslp 0 1:43 8.64% vsftpd
74752 ftp 1 8 0 10832K 2068K nanslp 1 0:09 6.06% vsftpd
74811 ftp 1 8 0 10832K 2072K nanslp 1 0:03 3.79% vsftpd
74869 ftp 1 4 0 10832K 2056K sbwait 1 0:00 3.39% vsftpd
As you can see, there is no high load, enought free memory and CPUs are
relative idle.
This is the output from nload -o 250000:
Device em0 [10.252.2.3] (1/1):
============================================================================================================================================
Incoming:
.. .| .|. . .. . ||...|.... .........|
. . .. Curr: 2519.37 kBit/s
####|||######.###################### ############... .....
..||... .|. ##|#|####. Avg: 2831.76 kBit/s
####################################|# .. |. |
######################|#######||############## Min: 141.74 kBit/s
#########################################.##.#|##############################################
Max: 6586.57 kBit/s
#############################################################################################
Ttl: 542932224.00 MByte
Outgoing:
.
.. # .. . . ##.. .... . .
# . # .#
## ## ### # .##.|| #########||
|###|#####.. |#.#.#|##
####||.######.###################### ############... | .
..# .| |. #########.
#################################### ###############.####|#
####|## ###|##########
####################################.# . .
##############################||##############
###################################### || ## #
############################################## Curr: 127802.60 kBit/s
######################################|## ##
#|############################################## Avg: 144300.05 kBit/s
#########################################|###################################################
Min: 3339.92 kBit/s
#############################################################################################
Max: 292397.75 kBit/s
#############################################################################################
Ttl: 14977425408.00 MByte
There is no rapid change of bw or anything.
Kris Kennaway wrote:
> Adding back the mailing list so others can help.
>
> matthew wrote:
>> Kris Kennaway wrote:
>>
>>> matthew wrote:
>>>> I have posted before that i have a stability issue with the 7.0 branch
>>>> on my servers. Tested on BETA2,BETA4,RC1,RC2,RELEASE
>>>>
>>>> The original thread and my post with details is at:
>>>> http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2007-12/msg00674.html
>>>>
>>>>
>>>> I was waiting for the 7.0-RELEASE, updated the whole servers, and
>>>> enabled dummynet again, but it always freezes after some minutes, 100%
>>>> reproducible.
>>>>
>>>> I tested it also on a HP 140 G3 1U server, where 6.3 has absolutely no
>>>> problems, but the 7.0 branch keeps freezing.
>>>>
>>>> Again, if it helps to solve this bug, i can rebuild the kernel with
>>>> debug symbols a take some screenshots :)
>>>
>>> Please follow the steps at
>>>
>>> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html
>>>
>>>
>>> Kris
>>> _______________________________________________
>> I added to the kernel these options:
>> makeoptions DEBUG=-g # Build kernel with gdb(1)
>> debug symbols
>> options INVARIANTS
>> options INVARIANT_SUPPORT
>> options WITNESS
>> options DEBUG_LOCKS
>> options DEBUG_VFS_LOCKS
>> options DIAGNOSTIC
>> options KDB
>> options DDB
>> options GDB
>> options SOCKBUF_DEBUG
>>
>> But still, the server freezes withoutany debug message,
>
> And you can't break to DDB? It is expected you will have to when you
> encounter a deadlock.
No, i am unable to do anything, no input from keyboard, the only one
thing thats work, is to change the num lock indicator on the keyboard :)
>
> > panic whatever,
>> but at leasts it dumps the crash kernel to /var/crash, but the debug
>> looks like this:
>
> Dumping is an activity the occurs after panicking, so if you have a
> dump then you must have a panic.
>> root at hanka:/usr/src/sys/amd64/compile/HANKA-debug# kgdb kernel.debug
>> /var/crash/vmcore.0
>> [GDB will not be able to debug user-mode threads:
>> /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
>> GNU gdb 6.1.1 [FreeBSD]
>> Copyright 2004 Free Software Foundation, Inc.
>> GDB is free software, covered by the GNU General Public License, and
>> you are
>> welcome to change it and/or distribute copies of it under certain
>> conditions.
>> Type "show copying" to see the conditions.
>> There is absolutely no warranty for GDB. Type "show warranty" for
>> details.
>> This GDB was configured as "amd64-marcel-freebsd".
>> Cannot access memory at address 0x57
>> (kgdb) backtrace
>> #0 0x0000000000000000 in ?? ()
>> Cannot access memory at address 0x0
>> (kgdb) I have some debug messages
>> at boot time:
>
> This usually means you are not running the dump against the
> kernel.debug corresponding exactly to the kernel that crashed.
> Probably the dump you found was old.
>
Yes, i think it is possible that it is the previous kernel, without the
debug options.
>> Feb 28 17:01:29 hanka lock order reversal:
>> Feb 28 17:01:29 hanka 1st 0xffffffff80b0e1e8 PFil hook read/write
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @
>> /usr/src/sys/modules/pf/../../contrib/pf/net/pf.c:2970
>> Feb 28 17:01:29 hanka KDB: stack backtrace:
>> Feb 28 17:01:29 hanka db_trace_self_wrapper() at
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
>> Feb 28 17:01:29 hanka pf_socket_lookup() at pf_socket_lookup+0x241
>> Feb 28 17:01:29 hanka pf_test_udp() at pf_test_udp+0x890
>> Feb 28 17:01:29 hanka pf_test() at pf_test+0xeb2
>> Feb 28 17:01:29 hanka pf_check_in() at pf_check_in+0x2b
>> Feb 28 17:01:29 hanka pfil_run_hooks() at pfil_run_hooks+0xbc
>> Feb 28 17:01:29 hanka ip_input() at ip_input+0x2c4
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
>> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30,
>> rbp = 0 ---
>
>
>> Feb 28 17:01:29 hanka lock order reversal:
>> Feb 28 17:01:29 hanka 1st 0xffffff0001237430 em0 (EM Core Lock) @
>> dev/em/if_em.c:1416
>> Feb 28 17:01:29 hanka 2nd 0xffffffff80b0f8e8 udp (udp) @
>> netinet/udp_usrreq.c:385
>> Feb 28 17:01:29 hanka KDB: stack backtrace:
>> Feb 28 17:01:29 hanka db_trace_self_wrapper() at
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:01:29 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:01:29 hanka _mtx_lock_flags() at _mtx_lock_flags+0x75
>> Feb 28 17:01:29 hanka udp_input() at udp_input+0x1a4
>> Feb 28 17:01:29 hanka ip_input() at ip_input+0xc0
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka ether_demux() at ether_demux+0xe9
>> Feb 28 17:01:29 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:01:29 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:01:29 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:01:29 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:01:29 hanka swi_net() at swi_net+0xea
>> Feb 28 17:01:29 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:01:29 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:01:29 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:01:29 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30,
>> rbp = 0 ---
>
>
>> Feb 28 17:01:29 hanka xinetd[1356]: xinetd Version 2.3.14 started
>> with libwrap loadavg options compiled in.
>> Feb 28 17:01:29 hanka xinetd[1356]: Started working: 1 available service
>> Feb 28 17:01:33 hanka lock order reversal:
>> Feb 28 17:01:33 hanka 1st 0xffffff00042f5b10 inp (tcpinp) @
>> netinet/tcp_usrreq.c:781
>> Feb 28 17:01:33 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:01:33 hanka KDB: stack backtrace:
>> Feb 28 17:01:33 hanka db_trace_self_wrapper() at
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:01:33 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:01:33 hanka _rw_rlock() at _rw_rlock+0x5b
>> Feb 28 17:01:33 hanka pfil_run_hooks() at pfil_run_hooks+0x44
>> Feb 28 17:01:33 hanka ip_output() at ip_output+0x38e
>> Feb 28 17:01:33 hanka tcp_output() at tcp_output+0xacd
>> Feb 28 17:01:33 hanka tcp_usr_send() at tcp_usr_send+0x272
>> Feb 28 17:01:33 hanka sosend_generic() at sosend_generic+0x347
>> Feb 28 17:01:33 hanka soo_write() at soo_write+0x38
>> Feb 28 17:01:33 hanka dofilewrite() at dofilewrite+0x85
>> Feb 28 17:01:33 hanka kern_writev() at
>> Feb 28 17:01:33 hanka kern_writev+0x4c
>> Feb 28 17:01:33 hanka write() at write+0x54
>> Feb 28 17:01:33 hanka syscall() at syscall+0x1f6
>> Feb 28 17:01:33 hanka Xfast_syscall() at Xfast_syscall+0xab
>> Feb 28 17:01:33 hanka --- syscall (4, FreeBSD ELF64, write), rip =
>> 0x800e1fd3c, rsp = 0x7fffffffea88, rbp = 0x16 ---
>
>
>> Feb 28 17:03:54 hanka lock order reversal:
>> Feb 28 17:03:54 hanka 1st 0xffffffff80b0f308 tcp (tcp) @
>> netinet/tcp_input.c:400
>> Feb 28 17:03:54 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:03:54 hanka KDB: stack backtrace:
>> Feb 28 17:03:54 hanka db_trace_self_wrapper() at
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:03:54 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:03:54 hanka _rw_rlock() at _rw_rlock+0x5b
>> Feb 28 17:03:54 hanka pfil_run_hooks() at pfil_run_hooks+0x44
>> Feb 28 17:03:54 hanka ip_output() at ip_output+0x38e
>> Feb 28 17:03:54 hanka tcp_respond() at tcp_respond+0x2d5
>> Feb 28 17:03:54 hanka tcp_dropwithreset() at tcp_dropwithreset+0x131
>> Feb 28 17:03:54 hanka tcp_input() at tcp_input+0x6d0
>> Feb 28 17:03:54 hanka ip_input() at ip_input+0xc0
>> Feb 28 17:03:54 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:03:54 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:03:54 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:03:54 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:03:54 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:03:54 hanka swi_net() at swi_net+0xea
>> Feb 28 17:03:54 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:03:54 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:03:54 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:03:54 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30,
>> rbp = 0 ---
>
>
>> Feb 28 17:23:36 hanka lock order reversal:
>> Feb 28 17:23:36 hanka 1st 0xffffffff810f9ef0 tcp_sc_head
>> (tcp_sc_head) @ netinet/tcp_syncache.c:477
>> Feb 28 17:23:36 hanka 2nd 0xffffffff80b0e1e8 PFil hook read/write
>> mutex (PFil hook read/write mutex) @ net/pfil.c:73
>> Feb 28 17:23:36 hanka KDB: stack backtrace:
>> Feb 28 17:23:36 hanka db_trace_self_wrapper() at
>> db_trace_self_wrapper+0x2a
>> Feb 28 17:23:36 hanka witness_checkorder() at witness_checkorder+0x605
>> Feb 28 17:23:36 hanka _rw_rlock() at _rw_rlock+0x5b
>> Feb 28 17:23:36 hanka pfil_run_hooks() at pfil_run_hooks+0x44
>> Feb 28 17:23:36 hanka ip_output() at ip_output+0x38e
>> Feb 28 17:23:36 hanka syncache_respond() at syncache_respond+0x32a
>> Feb 28 17:23:36 hanka syncache_add() at syncache_add+0x206
>> Feb 28 17:23:36 hanka tcp_input() at tcp_input+0xa77
>> Feb 28 17:23:36 hanka ip_input() at ip_input+0xc0
>> Feb 28 17:23:36 hanka ether_demux() at ether_demux+0x1d9
>> Feb 28 17:23:36 hanka ether_input() at ether_input+0x19d
>> Feb 28 17:23:36 hanka em_rxeof() at em_rxeof+0x1ca
>> Feb 28 17:23:36 hanka em_poll() at em_poll+0x6f
>> Feb 28 17:23:36 hanka netisr_poll() at netisr_poll+0x87
>> Feb 28 17:23:36 hanka swi_net() at swi_net+0xea
>> Feb 28 17:23:36 hanka ithread_loop() at ithread_loop+0xda
>> Feb 28 17:23:36 hanka fork_exit() at fork_exit+0x12a
>> Feb 28 17:23:36 hanka fork_trampoline() at fork_trampoline+0xe
>> Feb 28 17:23:36 hanka --- trap 0, rip = 0, rsp = 0xffffffff9feead30,
>> rbp = 0 ---
>
> These look useful and should point to the problem. It is also
> probably related to something in your script:
>
>> I tried it with pf disabled, only running ipfw without dummynet,
>> everything is ok.
>> I tried it witout pf disabled, running ipfw with dummynet queue for
>> small traffic, everything is ok.
>> I tried it with/without pf disabled, running ipfw with dummynet queue
>> for hight traffic >100Mbit/s, the box freezes, everytime, after
>> enabling rules with the dummynet queue, after some second or few
>> minutes, maybe depending on the high traffic.
>>
>> The shaper script:
>> # SHAPER
>> IPFW_PIPE_DOWNLOAD=1
>> IPFW_PIPE_UPLOAD=11
>> IPFW_QUEUE_DOWNLOAD=1
>> IPFW_QUEUE_UPLOAD=11
>>
>> DOWNLOAD_ROOT="450Mbit/s"
>> UPLOAD_ROOT="450Mbit/s"
>> SHAPER_BUCKETS="1024"
>>
>> IPFW_SHAPER_ROOT=1000
>>
>> $CMD pipe $IPFW_PIPE_DOWNLOAD config bw $DOWNLOAD_ROOT buckets
>> $SHAPER_BUCKETS
>> $CMD pipe $IPFW_PIPE_UPLOAD config bw $UPLOAD_ROOT buckets
>> $SHAPER_BUCKETS
>>
>> $CMD queue $IPFW_QUEUE_DOWNLOAD config pipe $IPFW_PIPE_DOWNLOAD
>> buckets $SHAPER_BUCKETS mask dst-ip 0xFFFFFFFF
>> $CMD queue $IPFW_QUEUE_UPLOAD config pipe $IPFW_PIPE_UPLOAD buckets
>> $SHAPER_BUCKETS mask src-ip 0xFFFFFFFF
>>
>> $CMD delete $IPFW_SHAPER_ROOT
>> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_DOWNLOAD ip from me to
>> any out // Share download
>> $CMD $IPFW_SHAPER_ROOT add queue $IPFW_QUEUE_UPLOAD ip from any to me
>> in // Share upload
>
> Thanks for the information.
>
> Kris
I hope i can help to solve this problem.
Matthew
More information about the freebsd-net
mailing list