[Bug 212920] Li loaded web server cath race condition on _close () from /lib/libc.so.7 with accf_http
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Fri Sep 23 10:10:05 UTC 2016
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=212920
Bug ID: 212920
Summary: Li loaded web server cath race condition on _close ()
from /lib/libc.so.7 with accf_http
Product: Base System
Version: 10.3-STABLE
Hardware: amd64
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: freebsd-bugs at FreeBSD.org
Reporter: fbsd98816551 at avksrv.org
CC: freebsd-amd64 at FreeBSD.org
CC: freebsd-amd64 at FreeBSD.org
Hello!
Recently we upgraded our high loaded web server to FREEBSD-STABLE 10.3 r305091
and got problem with NGINX (nginx-1.10.1_2,2 compiled from latest ports with
most default settings). After some time one worker stopped answer requests and
top command shows it in state soclos
1072 nobody 1 22 0 1698M 65680K soclos 5 0:13 0.00% nginx
after short while next worker stops in same state and so on untill all workers
become "soclos" and web server stops serve requests (but still accept
connections, which die on timeout after client sent a request). Increasing
workers count only move problem to next half an hour.
Restarting nginx fix for some not so long time. Server is more or less high
loaded with 1000-2000 request/sec. Actually server is frontend proxy with
proxy_cache functionality. We tried on 2 different phisical servers with
actually different NICs and CPUs. When we returned kernel (only kernel and
modules at /boot/kernel, not world) to r302223, problem gone.
We tried to upgrade to yesterdey's r306194. Problem is still here. Something
changed between end of Jun and end of Aug in kernel code what generate a
problem
backtrace from nginx while it in "soclos"
#0 0x0000000801a17d28 in _close () from /lib/libc.so.7
#1 0x000000080098a925 in pthread_suspend_all_np () from /lib/libthr.so.3
#2 0x00000000004329b9 in ngx_close_connection (c=0x869c1de70) at
src/core/ngx_connection.c:1169
#3 0x0000000000486370 in ngx_http_close_connection (c=0x869c1de70) at
src/http/ngx_http_request.c:3543
#4 0x0000000000488e86 in ngx_http_close_request (r=0x80244c050, rc=408) at
src/http/ngx_http_request.c:3406
#5 0x000000000048d9ed in ngx_http_process_request_headers (rev=0x807810b70) at
src/http/ngx_http_request.c:1202
#6 0x000000000044fdbd in ngx_event_expire_timers () at
src/event/ngx_event_timer.c:94
#7 0x000000000044e60f in ngx_process_events_and_timers (cycle=0x802488050) at
src/event/ngx_event.c:256
#8 0x000000000045f406 in ngx_worker_process_cycle (cycle=0x802488050,
data=0xa) at src/os/unix/ngx_process_cycle.c:753
#9 0x000000000045ae7c in ngx_spawn_process (cycle=0x802488050, proc=0x45f2f0
<ngx_worker_process_cycle>, data=0xa, name=0x53ecea "worker process",
respawn=-3) at src/os/unix/ngx_process.c:198
#10 0x000000000045cc89 in ngx_start_worker_processes (cycle=0x802488050, n=16,
type=-3) at src/os/unix/ngx_process_cycle.c:358
#11 0x000000000045c486 in ngx_master_process_cycle (cycle=0x802488050) at
src/os/unix/ngx_process_cycle.c:130
#12 0x0000000000413288 in main (argc=1, argv=0x7fffffffead0) at
src/core/nginx.c:367
(gdb) list src/core/ngx_connection.c:1169
1164
1165 if (c->shared) {
1166 return;
1167 }
1168
1169 if (ngx_close_socket(fd) == -1) { <<<<<<<<
1170
1171 err = ngx_socket_errno;
1172
1173 if (err == NGX_ECONNRESET || err == NGX_ENOTCONN) {
and actually called close(fd):
#define ngx_close_socket close
All TCP sessions opened by worker frose in present state.
Same if we do not load and do not use in nginx config accf_http, problem not
repeased with all 3 tested kernels
kernel GENERIC and only extra accf_http ipmi smbus mfip ums zfs and opensolaris
module loaded
As long as accf_http did some good for our server, we can not simple disabe
the module in production env.
I'll debug more, but as long as I'm not is good C programmer, it will take some
time. If someone knows what changed in related functions, may be it will be
faster to check from that side..
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the freebsd-amd64
mailing list