sshd with zombie process on FreeBSD 10.0-STABLE
Marcelo Gondim
gondim at bsdinfo.com.br
Fri Mar 28 19:57:44 UTC 2014
Em 28/03/14 15:01, Marcelo Gondim escreveu:
> Hi all,
>
> I have noticed zombie processes on the system after a few lost
> connections on ssh.
>
> # ps afx
> [...]
> 8045 - Is 0:00.01 sshd: unknown [priv] (sshd)
> 8046 - Z 0:00.01 <defunct>
> 8054 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 28146 - Is 0:00.01 sshd: unknown [priv] (sshd)
> 28147 - Z 0:00.01 <defunct>
> 28155 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 43320 - Is 0:00.01 sshd: unknown [priv] (sshd)
> 43321 - Z 0:00.01 <defunct>
> 43322 - IW 0:00.00 sshd: unknown [pam] (sshd)
> 73413 - Is 0:00.01 sshd: unknown [priv] (sshd)
> 73414 - Z 0:00.01 <defunct>
> 73430 - IW 0:00.00 sshd: unknown [pam] (sshd)
> [...]
>
> More information discussed in freebsd-stable at list:
>
>
> On Tue, Mar 25, 2014 at 3:52 AM, Karl Pielorz <kpielorz_lst at tdx.co.uk
> <mailto:kpielorz_lst at tdx.co.uk>> wrote:
>
>
>
> --On 21 March 2014 22:02 -0700 Kevin Oberman <rkoberman at gmail.com
> <mailto:rkoberman at gmail.com>> wrote:
>
> Ideally I'd look to try and capture the packets st the end of
> the session.
> Can you do something to trigger this reliably? if so "standard"
> "tcpdump
> -pw file.bpf host HOST". I seem to recall that these
> connections are
> scheduled. If so, you can put the packet capture in a crontab to
> run at
> the same time. If you feed this to a tool like wireshark, you
> should get
> a good idea of what is happening, if not why. I understand that
> the
> timing of this might be very tricky.
>
>
> Ok, fwiw as I have this issue as well - I've done a packet capture
> (it's below). This box has 59 'CLOSED' sockets from ssh on it, and
> 60 sshd stuck.
>
> Also - initially I thought this was a Xen issue - so there's a
> couple of posts on that list from a couple of weeks ago, in brief -
> the sshd processes I have are stuck in 'urdlck' - one of the Xen
> guys commented "It seems like the process is stuck while trying to
> acquire a rw mutex in read mode."
>
> I did a backtrace of a stuck process - I can post that if you want
> (or check the FreeBSD-Xen list for 'stuck sshd in urdlck'.
>
> Also, if I ssh into this host, 90% of the time (seems to get worse
> the longer the box is up) I get:
>
> "
> ssh_exchange_identification: Connection closed by remote host
> "
>
> That does not leave a lingering CLOSED socket. In fact, a successful
> ssh login - and logout, does not in testing appear to leave a
> lingering CLOSED socket, nor sshd stuck in urdlck - so I'm not
> entirely sure where they're coming from, or how often they are
> created.
>
> tcpdump from the start of a successful ssh connect:
>
> 10:47:04.777765 IP (tos 0x0, ttl 61, id 4058, offset 0, flags [DF],
> proto TCP (6), length 60)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [S], cksum 0x57ce
> (correct), seq 634709832, win 65535, options [mss 1368,nop,wscale
> 3,sackOK,TS val 1019391535 ecr 0], length 0
>
> 10:47:04.777776 IP (tos 0x0, ttl 64, id 10060, offset 0, flags [DF],
> proto TCP (6), length 60)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [S.], cksum 0x7ef8
> (incorrect -> 0x69d8), seq 2316386788, ack 634709833, win 65535,
> options [mss 1368,nop,wscale 6,sackOK,TS val 805368299 ecr
> 1019391535], length 0
>
> 10:47:04.804218 IP (tos 0x0, ttl 61, id 4059, offset 0, flags [DF],
> proto TCP (6), length 52)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [.], cksum 0x77d3
> (correct), ack 1, win 8305, options [nop,nop,TS val 1019391538 ecr
> 805368299], length 0
>
> 10:47:04.809692 IP (tos 0x0, ttl 64, id 10061, offset 0, flags [DF],
> proto TCP (6), length 99)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [P.], cksum 0x7f1f
> (incorrect -> 0x5799), seq 1:48, ack 1, win 1038, options
> [nop,nop,TS val 805368328 ecr 1019391538], length 47
>
> 10:47:04.836110 IP (tos 0x0, ttl 61, id 4060, offset 0, flags [DF],
> proto TCP (6), length 92)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [P.], cksum 0x8afa
> (correct), seq 1:41, ack 48, win 8305, options [nop,nop,TS val
> 1019391541 ecr 805368328], length 40
>
> 10:47:04.836669 IP (tos 0x0, ttl 64, id 10062, offset 0, flags [DF],
> proto TCP (6), length 1596)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [P.], cksum 0x84f8
> (incorrect -> 0x4d42), seq 48:1592, ack 41, win 1038, options
> [nop,nop,TS val 805368358 ecr 1019391541], length 1544
>
>
> And the end of the session:
>
> 10:47:15.243540 IP (tos 0x10, ttl 61, id 4132, offset 0, flags [DF],
> proto TCP (6), length 100)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [P.], cksum 0x6364
> (correct), seq 2321:2369, ack 3520, win 8305, options [nop,nop,TS
> val 1019392582 ecr 805375068], length 48
>
> 10:47:15.243736 IP (tos 0x10, ttl 64, id 10125, offset 0, flags
> [DF], proto TCP (6), length 100)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [P.], cksum 0x7f20
> (incorrect -> 0x3ea8), seq 3520:3568, ack 2369, win 1038, options
> [nop,nop,TS val 805378765 ecr 1019392582], length 48
>
> 10:47:15.243796 IP (tos 0x10, ttl 64, id 10126, offset 0, flags
> [DF], proto TCP (6), length 100)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [P.], cksum 0x7f20
> (incorrect -> 0xdd22), seq 3568:3616, ack 2369, win 1038, options
> [nop,nop,TS val 805378765 ecr 1019392582], length 48
>
> 10:47:15.244627 IP (tos 0x10, ttl 64, id 10127, offset 0, flags
> [DF], proto TCP (6), length 84)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [P.], cksum 0x7f10
> (incorrect -> 0x86ed), seq 3616:3648, ack 2369, win 1038, options
> [nop,nop,TS val 805378765 ecr 1019392582], length 32
>
> 10:47:15.244812 IP (tos 0x10, ttl 64, id 10128, offset 0, flags
> [DF], proto TCP (6), length 212)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [P.], cksum 0x7f90
> (incorrect -> 0x431c), seq 3648:3808, ack 2369, win 1038, options
> [nop,nop,TS val 805378765 ecr 1019392582], length 160
>
> 10:47:15.271439 IP (tos 0x10, ttl 61, id 4134, offset 0, flags [DF],
> proto TCP (6), length 52)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [.], cksum 0x3381
> (correct), ack 3616, win 8299, options [nop,nop,TS val 1019392585
> ecr 805378765], length 0
>
> 10:47:15.272238 IP (tos 0x10, ttl 61, id 4135, offset 0, flags [DF],
> proto TCP (6), length 52)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [.], cksum 0x32d9
> (correct), ack 3808, win 8275, options [nop,nop,TS val 1019392585
> ecr 805378765], length 0
>
> 10:47:15.273515 IP (tos 0x10, ttl 61, id 4137, offset 0, flags [DF],
> proto TCP (6), length 84)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [P.], cksum 0x43cc
> (correct), seq 2369:2401, ack 3808, win 8305, options [nop,nop,TS
> val 1019392585 ecr 805378765], length 32
>
> 10:47:15.276199 IP (tos 0x10, ttl 61, id 4138, offset 0, flags [DF],
> proto TCP (6), length 116)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [P.], cksum 0xb169
> (correct), seq 2401:2465, ack 3808, win 8305, options [nop,nop,TS
> val 1019392585 ecr 805378765], length 64
>
> 10:47:15.276220 IP (tos 0x10, ttl 64, id 10129, offset 0, flags
> [DF], proto TCP (6), length 52)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [.], cksum 0x7ef0
> (incorrect -> 0x4ea3), ack 2465, win 1037, options [nop,nop,TS val
> 805378793 ecr 1019392585], length 0
>
> 10:47:15.276970 IP (tos 0x10, ttl 61, id 4140, offset 0, flags [DF],
> proto TCP (6), length 52)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [F.], cksum 0x325a
> (correct), seq 2465, ack 3808, win 8305, options [nop,nop,TS val
> 1019392585 ecr 805378765], length 0
>
> 10:47:15.276978 IP (tos 0x10, ttl 64, id 10130, offset 0, flags
> [DF], proto TCP (6), length 52)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [.], cksum 0x7ef0
> (incorrect -> 0x4e9c), ack 2466, win 1038, options [nop,nop,TS val
> 805378798 ecr 1019392585], length 0
>
> 10:47:15.277212 IP (tos 0x10, ttl 64, id 10131, offset 0, flags
> [DF], proto TCP (6), length 52)
> 192.168.0.138.22 > 192.168.0.37.31139: Flags [F.], cksum 0x7ef0
> (incorrect -> 0x4e9b), seq 3808, ack 2466, win 1038, options
> [nop,nop,TS val 805378798 ecr 1019392585], length 0
>
> 10:47:15.303993 IP (tos 0x10, ttl 61, id 4142, offset 0, flags [DF],
> proto TCP (6), length 52)
> 192.168.0.37.31139 > 192.168.0.138.22: Flags [.], cksum 0x3235
> (correct), ack 3809, win 8305, options [nop,nop,TS val 1019392588
> ecr 805378798], length 0
>
>
> This box has no services running on it at present, and just sits
> there idle. I'll periodically check on it and see if the CLOSED
> socket count, or hung sshd count goes up.
>
> It's runing as a PVHVM domU under XenServer 6.2
>
>
> Since I am retired, I no longer have access to my beloved "TCP/IP
> Illustrated Vol. 1", but I believe CLOSED status indicates that the
> socket is no longer in use.As the name says, it is CLOSED. I don't
> know why sshd should be waiting on a CLOSED socket, nor do I
> understand why a CLOSED server socket should live on for extended times.
>
> That said, I have been monitoring my FreeBSD 10 system and it seems to
> pick up an occasional case where a socket gets "stuck" in the CLOSED
> state for a very long time. I can't say forever, but at least many
> minutes. I don't see this on 9.2 systems. Right now I have two CLOSED
> sockets, both the the same address on port 80. It appears that the
> only thing that removes the socket is to kill the owning process (in
> this case it's Firefox). The owning firefox process does exist just
> fine when asked and I think sshd should do so as well, but it looks to
> me like the root problem is that CLOSED socket seem to live on in 10.0
> and they don't on older versions.
>
> I think it's time to move this over to net@, as I suspect a change
> made in 10 is triggering this. I'm not sure the changes is wrong or
> that sshd is wrong in not exiting cleanly when sockets are hanging
> around in CLOSED state. (Nor am I sure sshd is wrong.)
>
> If I am confused about this, please let me know. I'm going ot read the
> 10.0 release notes to see if they say anything about this.
Another my server with 12 zombie process related with CLOSED connections:
last pid: 84965; load averages: 2.74, 2.65, 2.07 up 14+23:43:16 16:10:00
133 processes: 1 running, 120 sleeping, 12 zombie
CPU: 8.5% user, 0.0% nice, 39.3% system, 11.8% interrupt, 40.4% idle
Mem: 194M Active, 2575M Inact, 828M Wired, 77M Cache, 391M Buf, 3056K Free
Swap: 16G Total, 377M Used, 16G Free, 2% Inuse, 8K In
# netstat -n |grep CLOSED
tcp4 0 0 192.168.254.1.22 10.254.0.12.26792 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.60591 CLOSED
tcp4 0 0 192.168.254.1.22 10.254.0.12.26974 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.54034 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.52999 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.65345 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.64099 CLOSED
tcp4 0 0 192.168.254.1.22 10.254.0.12.26457 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.54181 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.53717 CLOSED
tcp4 0 0 186.xxx.xx.2.22 186.xxx.xx.8.57035 CLOSED
tcp4 0 0 192.168.254.1.22 10.254.0.12.26832 CLOSED
More information about the freebsd-net
mailing list