Re: Server became inaccessible because it ran out of swap space
Date: Fri, 05 Jul 2024 08:27:32 UTC
Hi, On 05/07/2024 07:56, Odhiambo Washington wrote: > I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also > configured 13GB or swap space. > > ``` > root@gw:/usr/local/bhyve-vms/scripts # swapinfo > Device 1K-blocks Used Avail Capacity > /dev/ada0p3 3163136 703316 2459820 22% > /dev/md0.eli 10485760 709352 9776408 7% > Total 13648896 1412668 12236228 10% > root@gw:/usr/local/bhyve-vms/scripts # > ``` > > A number of times it has become inaccessible until I do a hard reboot and > this has been caused by what I believe is running out of swap. > > Below is what I have obtained from /var/log/messages after I rebooted. > > How do I identify the culprit? Arrest the situation? > > > ``` > Jul 5 06:50:56 gw kernel: failed > Jul 5 06:52:11 gw kernel: failed > Jul 5 06:52:11 gw kernel: out of swap space > Jul 5 06:52:11 gw kernel: failed > Jul 5 06:52:11 gw kernel: failed > Jul 5 06:52:12 gw kernel: failed > Jul 5 06:52:12 gw kernel: failed > Jul 5 06:54:06 gw kernel: out of swap space > Jul 5 06:54:06 gw kernel: failed > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: tap4: link state changed to DOWN > Jul 5 07:16:30 gw kernel: out of swap space > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: tap5: link state changed to DOWN > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100 > (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 already in > queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0 > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: tap3: link state changed to DOWN > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: out of swap space > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:31 gw kernel: failed > Jul 5 07:16:31 gw kernel: failed > Jul 5 07:16:32 gw kernel: out of swap space > Jul 5 07:16:33 gw kernel: out of swap space > Jul 5 07:16:33 gw kernel: failed > Jul 5 07:16:33 gw kernel: failed > Jul 5 07:16:34 gw kernel: out of swap space > Jul 5 07:16:34 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:38 gw kernel: failed > ``` > > I'm not sure but looking at the bhyve processes being killed, it reminds of an earlier issue that was solved with: `vm.disable_swapspace_pageouts=1` Cheers, David