Re: Stuck processes in unkillable (STOP) state, listen queue overflow

Thu Oct 29 01:09:01 UTC 2015

Good day everyone !
From my point of view it seems like you're experiencing the "downgraded" hardware performance which causes you the problems you meet.
Try to switch for the "new-one" power supply at least.
Why I think so ? Because the bad power supplies are met much more often than the bad source code for FreeBSD. Of course I can't tell you you're completely wrong.
Best regards, Dimitry.
>Среда, 28 октября 2015, 12:00 UTC от freebsd-stable-request at freebsd.org:
>
>Send freebsd-stable mailing list submissions to
>freebsd-stable at freebsd.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
>https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>or, via email, send a message with subject or body 'help' to
>freebsd-stable-request at freebsd.org
>
>You can reach the person managing the list at
>freebsd-stable-owner at freebsd.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of freebsd-stable digest..."
>
>
>Today's Topics:
>
>   1. Re: Stuck processes in unkillable (STOP) state, listen queue
>      overflow (Zara Kanaeva)
>   2. Re: Stuck processes in unkillable (STOP) state, listen queue
>      overflow (Nagy, Attila)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Tue, 27 Oct 2015 14:42:42 +0100
>From: Zara Kanaeva < zara.kanaeva at ggi.uni-tuebingen.de >
>To:  freebsd-stable at freebsd.org
>Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
>overflow
>Message-ID:
>< 20151027144242.Horde.3Xc1_RqzaVMAZ12X6OPXfdN at webmail.uni-tuebingen.de >
>
>Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
>
>Hello,
>
>I have the same experience with apache and mapserver. It happens on 
>physical machine and ends with spontaneous reboot. This machine is 
>updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps 
>this machine doesn't have enough RAM (only 8GB), but I think that must 
>not be a reason for a spontaneous reboot.
>
>I had no such behavior with the same machine and FREEBSD 9.0 RELEASE 
>on it (I am not 100% sure, I have yet no possibility to test it).
>
>Regards, Z. Kanaeva.
>
>Zitat von "Nagy, Attila" < bra at fsn.hu >:
>
>> Hi,
>>
>> Recently I've started to see a lot of cases, where the log is full 
>> with "listen queue overflow" messages and the process behind the 
>> network socket is unavailable.
>> When I open a TCP to it, it opens but nothing happens (for example I 
>> get no SMTP banner from postfix, nor I get a log entry about the new 
>> connection).
>>
>> I've seen this with Java programs, postfix and redis, basically 
>> everything which opens a TCP and listens on the machine.
>>
>> For example, I have a redis process, which listens on 6381. When I 
>> telnet into it, the TCP opens, but the program doesn't respond.
>> When I kill it, nothing happens. Even with kill -9 yields only this state:
>>   PID USERNAME       THR PRI NICE   SIZE    RES STATE   C TIME    WCPU COMMAN
>>   776 redis            2  20    0 24112K  2256K STOP    3 16:56 
>> 0.00% redis-
>>
>> When I tcpdrop the connections of the process, tcpdrop reports 
>> success for the first time and failure for the second (No such 
>> process), but the connections remain:
>> # sockstat -4 | grep 776
>> redis    redis-serv 776   6  tcp4   *:6381 *:*
>> redis    redis-serv 776   9  tcp4   *:16381 *:*
>> redis    redis-serv 776   10 tcp4   127.0.0.1:16381 127.0.0.1:10460
>> redis    redis-serv 776   11 tcp4   127.0.0.1:16381 127.0.0.1:35795
>> redis    redis-serv 776   13 tcp4   127.0.0.1:30027 127.0.0.1:16379
>> redis    redis-serv 776   14 tcp4   127.0.0.1:58802 127.0.0.1:16384
>> redis    redis-serv 776   17 tcp4   127.0.0.1:16381 127.0.0.1:24354
>> redis    redis-serv 776   18 tcp4   127.0.0.1:16381 127.0.0.1:56999
>> redis    redis-serv 776   19 tcp4   127.0.0.1:16381 127.0.0.1:39488
>> redis    redis-serv 776   20 tcp4   127.0.0.1:6381 127.0.0.1:39491
>> # sockstat -4 | grep 776 | awk '{print "tcpdrop "$6" "$7}' | /bin/sh
>> tcpdrop: getaddrinfo: * port 6381: hostname nor servname provided, 
>> or not known
>> tcpdrop: getaddrinfo: * port 16381: hostname nor servname provided, 
>> or not known
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 10460: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 35795: No such process
>> tcpdrop: 127.0.0.1 30027 127.0.0.1 16379: No such process
>> tcpdrop: 127.0.0.1 58802 127.0.0.1 16384: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 24354: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 56999: No such process
>> tcpdrop: 127.0.0.1 16381 127.0.0.1 39488: No such process
>> tcpdrop: 127.0.0.1 6381 127.0.0.1 39491: No such process
>> # sockstat -4 | grep 776
>> redis    redis-serv 776   6  tcp4   *:6381 *:*
>> redis    redis-serv 776   9  tcp4   *:16381 *:*
>> redis    redis-serv 776   10 tcp4   127.0.0.1:16381 127.0.0.1:10460
>> redis    redis-serv 776   11 tcp4   127.0.0.1:16381 127.0.0.1:35795
>> redis    redis-serv 776   13 tcp4   127.0.0.1:30027 127.0.0.1:16379
>> redis    redis-serv 776   14 tcp4   127.0.0.1:58802 127.0.0.1:16384
>> redis    redis-serv 776   17 tcp4   127.0.0.1:16381 127.0.0.1:24354
>> redis    redis-serv 776   18 tcp4   127.0.0.1:16381 127.0.0.1:56999
>> redis    redis-serv 776   19 tcp4   127.0.0.1:16381 127.0.0.1:39488
>> redis    redis-serv 776   20 tcp4   127.0.0.1:6381 127.0.0.1:39491
>>
>> $ procstat -k 776
>>   PID    TID COMM             TDNAME KSTACK
>>   776 100725 redis-server     -                mi_switch 
>> sleepq_timedwait_sig _sleep kern_kevent sys_kevent amd64_syscall 
>> Xfast_syscall
>>   776 100744 redis-server     -                mi_switch 
>> thread_suspend_switch thread_single exit1 sigexit postsig ast 
>> doreti_ast
>>
>> I can do nothing to get out from this state, only reboot helps.
>>
>> The OS is stable/10 at r289313, but I could observe this behaviour with 
>> earlier releases too.
>>
>> The dmesg is full with lines like these:
>> sonewconn: pcb 0xfffff8004dc54498: Listen queue overflow: 193 
>> already in queue awaiting acceptance (3142 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 
>> already in queue awaiting acceptance (3068 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 
>> already in queue awaiting acceptance (3057 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 
>> already in queue awaiting acceptance (3037 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 
>> already in queue awaiting acceptance (3015 occurrences)
>> sonewconn: pcb 0xfffff8004d9ed188: Listen queue overflow: 193 
>> already in queue awaiting acceptance (3035 occurrences)
>>
>> I guess this is the effect of the process freeze, not the cause (the 
>> listen queue fills up because the app can't handle the incoming 
>> connections).
>>
>> I'm not sure it matters, but some of the machines (and the above) 
>> runs on an ESX hypervisor (but as far as I can remember, I could see 
>> this on physical machines too, but I'm not sure about that).
>> Also -so far- I could only see this where some "exotic" stuff ran, 
>> like a java or erlang based server (opendj, elasticsearch and 
>> rabbitmq).
>>
>> Also not sure about which triggers this. I've never seen this after 
>> some hours of uptime, at least some days or a week must've been 
>> passed to get stuck like the above.
>>
>> Any ideas about this?
>>
>> Thanks,
>> _______________________________________________
>>  freebsd-stable at freebsd.org mailing list
>>  https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to " freebsd-stable-unsubscribe at freebsd.org "
>
>
>
>-- 
>Dipl.-Inf. Zara Kanaeva
>Heidelberger Akademie der Wissenschaften
>Forschungsstelle "The role of culture in early expansions of humans"
>an der Universit?t T?bingen
>Geographisches Institut
>Universit?t T?bingen
>Ruemelinstr. 19-23
>72070 Tuebingen
>
>Tel.: +49-(0)7071-2972132
>e-mail:  zara.kanaeva at geographie.uni-tuebingen.de
>-------
>- Theory is when you know something but it doesn't work.
>- Practice is when something works but you don't know why.
>- Usually we combine theory and practice:
>         Nothing works and we don't know why.
>
>
>
>------------------------------
>
>Message: 2
>Date: Tue, 27 Oct 2015 17:25:01 +0100
>From: "Nagy, Attila" < bra at fsn.hu >
>To: Zara Kanaeva < zara.kanaeva at ggi.uni-tuebingen.de >,
>freebsd-stable at freebsd.org
>Subject: Re: Stuck processes in unkillable (STOP) state, listen queue
>overflow
>Message-ID: < 562FA55D.6050503 at fsn.hu >
>Content-Type: text/plain; charset=utf-8; format=flowed
>
>Hi,
>
>(following topposting)
>I have seen this with 16 and 32 GiB of RAM, but anyways, it shouldn't 
>matter.
>Do you use zfs? Although it doesn't seem to be stuck on IO...
>
>On 10/27/15 14:42, Zara Kanaeva wrote:
>> Hello,
>>
>> I have the same experience with apache and mapserver. It happens on 
>> physical machine and ends with spontaneous reboot. This machine is 
>> updated from FREEBSD 9.0 RELEASE to FREEBSD 10.2-PRERELEASE. Perhaps 
>> this machine doesn't have enough RAM (only 8GB), but I think that must 
>> not be a reason for a spontaneous reboot.
>>
>> I had no such behavior with the same machine and FREEBSD 9.0 RELEASE 
>> on it (I am not 100% sure, I have yet no possibility to test it).
>>
>> Regards, Z. Kanaeva.
>