CURRENT slow and shaky network stability
O. Hartmann
ohartman at zedat.fu-berlin.de
Sat Apr 2 21:19:36 UTC 2016
Am Sat, 2 Apr 2016 11:39:10 +0200
"O. Hartmann" <ohartman at zedat.fu-berlin.de> schrieb:
> Am Sat, 2 Apr 2016 10:55:03 +0200
> "O. Hartmann" <ohartman at zedat.fu-berlin.de> schrieb:
>
> > Am Sat, 02 Apr 2016 01:07:55 -0700
> > Cy Schubert <Cy.Schubert at komquats.com> schrieb:
> >
> > > In message <56F6C6B0.6010103 at protected-networks.net>, Michael Butler writes:
> > > > -current is not great for interactive use at all. The strategy of
> > > > pre-emptively dropping idle processes to swap is hurting .. big time.
> > >
> > > FreeBSD doesn't "preemptively" or arbitrarily push pages out to disk. LRU
> > > doesn't do this.
> > >
> > > >
> > > > Compare inactive memory to swap in this example ..
> > > >
> > > > 110 processes: 1 running, 108 sleeping, 1 zombie
> > > > CPU: 1.2% user, 0.0% nice, 4.3% system, 0.0% interrupt, 94.5% idle
> > > > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> > > > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> > >
> > > To analyze this you need to capture vmstat output. You'll see the free pool
> > > dip below a threshold and pages go out to disk in response. If you have
> > > daemons with small working sets, pages that are not part of the working
> > > sets for daemons or applications will eventually be paged out. This is not
> > > a bad thing. In your example above, the 281 MB of UFS buffers are more
> > > active than the 917 MB paged out. If it's paged out and never used again,
> > > then it doesn't hurt. However the 281 MB of buffers saves you I/O. The
> > > inactive pages are part of your free pool that were active at one time but
> > > now are not. They may be reclaimed and if they are, you've just saved more
> > > I/O.
> > >
> > > Top is a poor tool to analyze memory use. Vmstat is the better tool to help
> > > understand memory use. Inactive memory isn't a bad thing per se. Monitor
> > > page outs, scan rate and page reclaims.
> > >
> > >
> >
> > I give up! Tried to check via ssh/vmstat what is going on. Last lines before broken
> > pipe:
> >
> > [...]
> > procs memory page disks faults cpu
> > r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs us sy id
> > 22 0 22 5.8G 1.0G 46319 0 0 0 55721 1297 0 4 219 23907 5400 95 5 0
> > 22 0 22 5.4G 1.3G 51733 0 0 0 72436 1162 0 0 108 40869 3459 93 7 0
> > 15 0 22 12G 1.2G 54400 0 27 0 52188 1160 0 42 148 52192 4366 91 9 0
> > 14 0 22 12G 1.0G 44954 0 37 0 37550 1179 0 39 141 86209 4368 88 12 0
> > 26 0 22 12G 1.1G 60258 0 81 0 69459 1119 0 27 123 779569 704359 87 13 0
> > 29 3 22 13G 774M 50576 0 68 0 32204 1304 0 2 102 507337 484861 93 7 0
> > 27 0 22 13G 937M 47477 0 48 0 59458 1264 3 2 112 68131 44407 95 5 0
> > 36 0 22 13G 829M 83164 0 2 0 82575 1225 1 0 126 99366 38060 89 11 0
> > 35 0 22 6.2G 1.1G 98803 0 13 0 121375 1217 2 8 112 99371 4999 85 15 0
> > 34 0 22 13G 723M 54436 0 20 0 36952 1276 0 17 153 29142 4431 95 5 0
> > Fssh_packet_write_wait: Connection to 192.168.0.1 port 22: Broken pipe
> >
> >
> > This makes this crap system completely unusable. The server (FreeBSD 11.0-CURRENT #20
> > r297503: Sat Apr 2 09:02:41 CEST 2016 amd64) in question did poudriere bulk job. I
> > can not even determine what terminal goes down first - another one, much more time
> > idle than the one shwoing the "vmstat 5" output, is still alive!
> >
> > i consider this a serious bug and it is no benefit what happened since this "fancy"
> > update. :-(
>
> By the way - it might be of interest and some hint.
>
> One of my boxes is acting as server and gateway. It utilises NAT, IPFW, when it is under
> high load, as it was today, sometimes passing the network flow from ISP into the network
> for clients is extremely slow. I do not consider this the reason for collapsing ssh
> sessions, since this incident happens also under no-load, but in the overall-view onto
> the problem, this could be a hint - I hope.
I just checked on one box, that "broke pipe" very quickly after I started poudriere,
while it did well a couple of hours before until the pipe broke. It seems it's load
dependend when the ssh session gets wrecked, but more important, after the long-haul
poudriere run, I rebooted the box and tried again with the mentioned broken pipe after a
couple of minutes after poudriere ran. Then I left the box for several hours and logged
in again and checked the swap. Although there was for hours no load or other pressure,
there were 31% of of swap used - still (box has 16 GB of RAM and is propelled by a XEON
E3-1245 V2).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-current/attachments/20160402/fac89a1f/attachment.sig>
More information about the freebsd-current
mailing list