CURRENT slow and shaky network stability
O. Hartmann
ohartman at zedat.fu-berlin.de
Tue Mar 29 06:08:49 UTC 2016
On Mon, 28 Mar 2016 14:52:09 -0700 (PDT)
Don Lewis <truckman at FreeBSD.org> wrote:
> On 28 Mar, O. Hartmann wrote:
> > Am Sat, 26 Mar 2016 14:26:45 -0700 (PDT)
> > Don Lewis <truckman at FreeBSD.org> schrieb:
> >
> >> On 26 Mar, Michael Butler wrote:
> >> > -current is not great for interactive use at all. The strategy of
> >> > pre-emptively dropping idle processes to swap is hurting .. big time.
> >> >
> >> > Compare inactive memory to swap in this example ..
> >> >
> >> > 110 processes: 1 running, 108 sleeping, 1 zombie
> >> > CPU: 1.2% user, 0.0% nice, 4.3% system, 0.0% interrupt, 94.5% idle
> >> > Mem: 474M Active, 1609M Inact, 764M Wired, 281M Buf, 119M Free
> >> > Swap: 4096M Total, 917M Used, 3178M Free, 22% Inuse
> >> >
> >> > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
> >> > COMMAND
> >> > 1819 imb 1 28 0 213M 11284K select 1 147:44 5.97%
> >> > gkrellm
> >> > 59238 imb 43 20 0 980M 424M select 0 10:07 1.92%
> >> > firefox
> >> >
> >> > .. it shouldn't start randomly swapping out processes because they're
> >> > used infrequently when there's more than enough RAM to spare ..
> >>
> >> I don't know what changed, and probably something can use some tweaking,
> >> but paging out idle processes isn't always the wrong thing to do. For
> >> instance if I'm using poudriere to build a bunch of packages and its
> >> heavy use of tmpfs is pushing the machine into many GB of swap usage, I
> >> don't want interactive use like:
> >> vi foo.c
> >> cc foo.c
> >> vi foo.c
> >> to suffer because vi and cc have to be read in from a busy hard drive
> >> each time while unused console getty and idle sshd processes in a bunch
> >> of jails are still hanging on to memory even though they haven't
> >> executed any instructions since shortly after the machine was booted
> >> weeks ago.
> >>
> >> > It also shows up when trying to reboot .. on all of my gear, 90 seconds
> >> > of "fail-safe" time-out is no longer enough when a good proportion of
> >> > daemons have been dropped onto swap and must be brought back in to flush
> >> > their data segments :-(
> >>
> >> That's a different and known problem. See:
> >> <https://svnweb.freebsd.org/base/releng/10.3/bin/csh/config_p.h?revision=297204&view=markup>
> >
> > CURRENT has rendered unusable and faulty. Updating ports for poudriere ends
> > up in this error/broken pipe from remote console:
> >
> > [~] poudriere ports -u -p head
> > [00:00:00] ====>> Updating portstree "head"
> > [00:00:00] ====>> Updating the ports tree... done
> > root at gate [~] Fssh_packet_write_wait: Connection to 192.168.250.111 port
> > 22: Broken pipe
> >
> >
> > Although not under load, several processes over time gets idled/paged out -
> > and they never recover, the connection is then sabott, the whole thing
> > unusable :-(
>
> I'm definitely not seeing that here. This is getting close to the end
> of a big poudriere run:
>
> last pid: 82549; load averages: 20.05, 20.72, 23.51 up 5+12:34:14
> 12:51:55 144 processes: 20 running, 109 sleeping, 15 stopped
> CPU: 85.3% user, 0.0% nice, 14.7% system, 0.0% interrupt, 0.0% idle
> Mem: 1082M Active, 19G Inact, 9718M Wired, 249M Buf, 1095M Free
> ARC: 3841M Total, 2039M MFU, 642M MRU, 3395K Anon, 111M Header, 1044M Other
> Swap: 40G Total, 9691M Used, 31G Free, 23% Inuse, 196K In
>
> At the moment, openoffice-4, openoffice-devel, libreoffice, and chromium
> are all being built and are using tmpfs for "wrkdir data localbase", so
> there are many GB of data in tmpfs, which is the reason for the high
> inact and swap usage. I just hit the return key in an idle (for a
> couple of hours) terminal window containing an ssh login session to the
> same machine. I got a fresh command prompt essentially instantaneously.
> It couldn't have taken more than a couple hundred milliseconds to wake
> up and page in the idle sshd and shell processes on the build server.
>
> [a couple hours later, after poudriere is done and all tmpfs is gone]
>
> last pid: 66089; load averages: 0.13, 1.59, 4.61 up 5+14:14:33
> 14:32:14 71 processes: 1 running, 55 sleeping, 15 stopped
> CPU: 3.1% user, 0.0% nice, 0.0% system, 0.0% interrupt, 96.9% idle
> Mem: 58M Active, 85M Inact, 12G Wired, 249M Buf, 19G Free
> ARC: 6249M Total, 2792M MFU, 2246M MRU, 16K Anon, 133M Header, 1078M Other
> Swap: 40G Total, 81M Used, 40G Free
>
> [after tracking down and exiting all of those stopped processes]
>
> last pid: 66103; load averages: 0.20, 0.99, 3.80 up 5+14:17:18
> 14:34:59 56 processes: 1 running, 55 sleeping
> CPU: 0.0% user, 0.0% nice, 0.1% system, 0.1% interrupt, 99.9% idle
> Mem: 57M Active, 88M Inact, 12G Wired, 249M Buf, 19G Free
> ARC: 6251M Total, 2793M MFU, 2247M MRU, 16K Anon, 133M Header, 1078M Other
> Swap: 40G Total, 63M Used, 40G Free
>
> The biggest chunk of the 63 MB of swap appears to be nginx. It's
> process size is 29 MB, but it has zero resident. It hasn't executed any
> code since it was first started when I booted the system several days
> ago. Other consumers appear to be getty and sshd and syslogd in various
> untouched jails.
>
>
> I've seen reports that r296137 and r297267 show the ssh problem, but
> this machine is in the middle with r297204 and I don't see it.
>
> As mentioned previously, I'm not running Xorg and a bunch of bloated
> X11 clients on this machine. Those make fat targets for having RAM
> taken from them, which would probably make my interactive experience
> less pleasant, but that should still not affect ssh.
>
> On my FreeBSD 10 machine, which has only 8 GB of RAM, my experience is
> that firefox gets pretty bloated after a while. It's currently at 2.6
> GB (with 2.8 GB of swap currently in use - I've got some other RAM hogs
> running as well) and I'm not seeing any problems, but when it gets up in
> the 4-5 GB range, things can start to get pretty laggy, but I don't see
> problems with ssh. The biggest problem with firefox seems to be
> javascript, which seems to leak memory like a sieve. Making heavy use
> of the noscript plugin is the only way to keep Firefox usable.
>
> The only thing I can think of is that this is triggered by something in
> the machine configuration or the specific hardware. I'm running a
> GENERIC kernel and the only non-standard modification to /usr/src is the
> dummynet AQM patchset. The latter should have no effect since I"m not
> using ipfw on this machine.
>
> If I get a chance, I try booting my FreeBSD 11 machine with less RAM to
> see if that is a trigger.
Several of my boxes do not run X11 or "... a bunch of bloated X11 clients"
and they run with 8 GB, 16 GB or 32 GB of RAM (the latter one
does have X11). On all remote systems with most recent CURRENT (we are talking
about r297237 - 297369 tight now) I definitely do not get "immediately" a fresh
prompt. it takes up to 60 seconds (and more) to recover, even if the box is in
a state of unemployment (idle!). In a seriously rising bunch of cases I get now
broken pipes. This also happens with sessions, when performing "poudriere
options" on larger installations and this is completely unacceptable.
More information about the freebsd-current
mailing list