From nobody Fri Jul 05 08:47:39 2024 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WFnGq4d45z5Nh7h for ; Fri, 05 Jul 2024 08:48:19 +0000 (UTC) (envelope-from odhiambo@gmail.com) Received: from mail-ot1-x32b.google.com (mail-ot1-x32b.google.com [IPv6:2607:f8b0:4864:20::32b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WFnGp36ypz4kVx for ; Fri, 5 Jul 2024 08:48:18 +0000 (UTC) (envelope-from odhiambo@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-ot1-x32b.google.com with SMTP id 46e09a7af769-700cc388839so922430a34.0 for ; Fri, 05 Jul 2024 01:48:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720169296; x=1720774096; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Xh/GzXDaK8LQ08aU3jK3Uv2rxUzUublcmOpMS+pOEBQ=; b=SXFob1XDI4KvpRU05hQobgdgX7xON6V0772UcUMzEGKTrCANcdphVyFcMa9kprF1Bj CJPXrJcMYveNnkneb+teRwBwQA4ELCGHIhk8fk01wl0ga35k6iAb9NCojipMk0yagKpI hKJFUrRftlTeVcBHGW5resvga/9lx/sSFwl0y8GJX+iQlgKKjn1MpRZJmFzl7p06pFIR 8YfA3tYUAlJFQceiFEtJEwGpnYRiN6b4JsVr/NUPAHjd//qNsti3k93j8dscmsQw9xQe X5xq/rLS0Q7FiaEFRO5mV33X5ebBW/dyFbrHXxDARHs1UxQO0vcuxXaWfA45uVxJkG9L WssA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720169296; x=1720774096; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Xh/GzXDaK8LQ08aU3jK3Uv2rxUzUublcmOpMS+pOEBQ=; b=G902p65DRUUsluJF5z4Kg5iqW1yMAyIFH0FLteuORUOMARpdgAILhQ6vZKU7VMix07 5boJWu8kFHv4AZCEBQTpQOcJevTOu/HfflD5SrLNadiTS7q7aOZLrHMX/UWgsjfl8Aji XBg7a1MHy/DRc60KRMnRJj0Z849KRvOvhQjXnRfzD3+k29CrHASjhLIy0ehfNgU8kPAT yiYZFs0F/bXDiEe0gl/j5CUD3gnOprAfoV2/dho49dJW6NR1oi4/e+ys5BALCngTrzky ktjToV4XSEL18QZEdxXnvElh5dqebIEps8NI+iuIdI2kWuYVN55UszXOWBDGtptkIP4N bvsg== X-Gm-Message-State: AOJu0YwazoLHPP4ncbVHpy6QT3VdmF18ZuQ+674QtmFmqb4ROjkRjddM BknAdjXHWDOO8k5uwU0RqPj4FrRnTAiYptMrfOzEaHPn1LeddFqyugBADNbkQ/nQV+azZvDmDfD JLbnjpFqTB/sTzQ9vI6XQNmoGOj55GcenpJg= X-Google-Smtp-Source: AGHT+IHq2s7pVcl3NjpSaZCAu99AZvzg/wMQ+uHRDdB5KE3JZv2SEjCduMR8Lw5IWfw6dgdbdJMOTwIiyeFog59BOBU= X-Received: by 2002:a05:6870:fb8b:b0:259:89a5:440e with SMTP id 586e51a60fabf-25e2bb7fb35mr3352241fac.27.1720169296199; Fri, 05 Jul 2024 01:48:16 -0700 (PDT) List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-questions@freebsd.org Sender: owner-freebsd-questions@FreeBSD.org MIME-Version: 1.0 References: <8d2a864b-a2ad-48b7-9c52-32b2af3ceb79@takinobori.com> In-Reply-To: <8d2a864b-a2ad-48b7-9c52-32b2af3ceb79@takinobori.com> From: Odhiambo Washington Date: Fri, 5 Jul 2024 11:47:39 +0300 Message-ID: Subject: Re: Server became inaccessible because it ran out of swap space To: David Palma Cc: questions Content-Type: multipart/alternative; boundary="000000000000da8aba061c7c1f49" X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] X-Rspamd-Queue-Id: 4WFnGp36ypz4kVx --000000000000da8aba061c7c1f49 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jul 5, 2024 at 11:27=E2=80=AFAM David Palma wrote: > Hi, > > On 05/07/2024 07:56, Odhiambo Washington wrote: > > I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also > > configured 13GB or swap space. > > > > ``` > > root@gw:/usr/local/bhyve-vms/scripts # swapinfo > > Device 1K-blocks Used Avail Capacity > > /dev/ada0p3 3163136 703316 2459820 22% > > /dev/md0.eli 10485760 709352 9776408 7% > > Total 13648896 1412668 12236228 10% > > root@gw:/usr/local/bhyve-vms/scripts # > > ``` > > > > A number of times it has become inaccessible until I do a hard reboot a= nd > > this has been caused by what I believe is running out of swap. > > > > Below is what I have obtained from /var/log/messages after I rebooted. > > > > How do I identify the culprit? Arrest the situation? > > > > > > ``` > > Jul 5 06:50:56 gw kernel: failed > > Jul 5 06:52:11 gw kernel: failed > > Jul 5 06:52:11 gw kernel: out of swap space > > Jul 5 06:52:11 gw kernel: failed > > Jul 5 06:52:11 gw kernel: failed > > Jul 5 06:52:12 gw kernel: failed > > Jul 5 06:52:12 gw kernel: failed > > Jul 5 06:54:06 gw kernel: out of swap space > > Jul 5 06:54:06 gw kernel: failed > > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: tap4: link state changed to DOWN > > Jul 5 07:16:30 gw kernel: out of swap space > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: tap5: link state changed to DOWN > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100 > > (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 alread= y > in > > queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0 > > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > > failed to reclaim memory > > Jul 5 07:16:30 gw kernel: tap3: link state changed to DOWN > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:30 gw kernel: out of swap space > > Jul 5 07:16:30 gw kernel: failed > > Jul 5 07:16:31 gw kernel: failed > > Jul 5 07:16:31 gw kernel: failed > > Jul 5 07:16:32 gw kernel: out of swap space > > Jul 5 07:16:33 gw kernel: out of swap space > > Jul 5 07:16:33 gw kernel: failed > > Jul 5 07:16:33 gw kernel: failed > > Jul 5 07:16:34 gw kernel: out of swap space > > Jul 5 07:16:34 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:36 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:37 gw kernel: failed > > Jul 5 07:16:38 gw kernel: failed > > ``` > > > > > > I'm not sure but looking at the bhyve processes being killed, it reminds > of an earlier issue that was solved with: > > `vm.disable_swapspace_pageouts=3D1` > > Cheers, > David > Hello David, Thank you for this. Let me enable this and monitor. --=20 Best regards, Odhiambo WASHINGTON, Nairobi,KE +254 7 3200 0004/+254 7 2274 3223 In an Internet failure case, the #1 suspect is a constant: DNS. "Oh, the cruft.", egrep -v '^$|^.*#' =C2=AF\_(=E3=83=84)_/=C2=AF :-) [How to ask smart questions: http://www.catb.org/~esr/faqs/smart-questions.html] --000000000000da8aba061c7c1f49 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Fri, Jul 5, 2024 at 11:27=E2=80=AF= AM David Palma <david.palm= a@takinobori.com> wrote:
Hi,

On 05/07/2024 07:56, Odhiambo Washington wrote:
> I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also > configured 13GB or swap space.
>
> ```
> root@gw:/usr/local/bhyve-vms/scripts # swapinfo
> Device=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1K-blocks=C2=A0 =C2=A0 =C2=A0= Used=C2=A0 =C2=A0 Avail Capacity
> /dev/ada0p3=C2=A0 =C2=A0 =C2=A0 =C2=A03163136=C2=A0 =C2=A0703316=C2=A0= 2459820=C2=A0 =C2=A0 22%
> /dev/md0.eli=C2=A0 =C2=A0 =C2=A010485760=C2=A0 =C2=A0709352=C2=A0 9776= 408=C2=A0 =C2=A0 =C2=A07%
> Total=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 13648896=C2=A0 1412668 = 12236228=C2=A0 =C2=A0 10%
> root@gw:/usr/local/bhyve-vms/scripts #
> ```
>
> A number of times it has become inaccessible until I do a hard reboot = and
> this has been caused by what I believe is running out of swap.
>
> Below is what I have obtained from /var/log/messages after I rebooted.=
>
> How do I identify the culprit? Arrest the situation?
>
>
> ```
> Jul=C2=A0 5 06:50:56 gw kernel: failed
> Jul=C2=A0 5 06:52:11 gw kernel: failed
> Jul=C2=A0 5 06:52:11 gw kernel: out of swap space
> Jul=C2=A0 5 06:52:11 gw kernel: failed
> Jul=C2=A0 5 06:52:11 gw kernel: failed
> Jul=C2=A0 5 06:52:12 gw kernel: failed
> Jul=C2=A0 5 06:52:12 gw kernel: failed
> Jul=C2=A0 5 06:54:06 gw kernel: out of swap space
> Jul=C2=A0 5 06:54:06 gw kernel: failed
> Jul=C2=A0 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was ki= lled:
> failed to reclaim memory
> Jul=C2=A0 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was ki= lled:
> failed to reclaim memory
> Jul=C2=A0 5 07:16:30 gw kernel: tap4: link state changed to DOWN
> Jul=C2=A0 5 07:16:30 gw kernel: out of swap space
> Jul=C2=A0 5 07:16:30 gw kernel: failed
> Jul=C2=A0 5 07:16:30 gw kernel: failed
> Jul=C2=A0 5 07:16:30 gw kernel: failed
> Jul=C2=A0 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was k= illed:
> failed to reclaim memory
> Jul=C2=A0 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was k= illed:
> failed to reclaim memory
> Jul=C2=A0 5 07:16:30 gw kernel: tap5: link state changed to DOWN
> Jul=C2=A0 5 07:16:30 gw kernel: failed
> Jul=C2=A0 5 07:16:30 gw kernel: failed
> Jul=C2=A0 5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100
> (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 alrea= dy in
> queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0
> Jul=C2=A0 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was ki= lled:
> failed to reclaim memory
> Jul=C2=A0 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was ki= lled:
> failed to reclaim memory
> Jul=C2=A0 5 07:16:30 gw kernel: tap3: link state changed to DOWN
> Jul=C2=A0 5 07:16:30 gw kernel: failed
> Jul=C2=A0 5 07:16:30 gw kernel: out of swap space
> Jul=C2=A0 5 07:16:30 gw kernel: failed
> Jul=C2=A0 5 07:16:31 gw kernel: failed
> Jul=C2=A0 5 07:16:31 gw kernel: failed
> Jul=C2=A0 5 07:16:32 gw kernel: out of swap space
> Jul=C2=A0 5 07:16:33 gw kernel: out of swap space
> Jul=C2=A0 5 07:16:33 gw kernel: failed
> Jul=C2=A0 5 07:16:33 gw kernel: failed
> Jul=C2=A0 5 07:16:34 gw kernel: out of swap space
> Jul=C2=A0 5 07:16:34 gw kernel: failed
> Jul=C2=A0 5 07:16:36 gw kernel: failed
> Jul=C2=A0 5 07:16:36 gw kernel: failed
> Jul=C2=A0 5 07:16:36 gw kernel: failed
> Jul=C2=A0 5 07:16:36 gw kernel: failed
> Jul=C2=A0 5 07:16:36 gw kernel: failed
> Jul=C2=A0 5 07:16:37 gw kernel: failed
> Jul=C2=A0 5 07:16:37 gw kernel: failed
> Jul=C2=A0 5 07:16:37 gw kernel: failed
> Jul=C2=A0 5 07:16:37 gw kernel: failed
> Jul=C2=A0 5 07:16:37 gw kernel: failed
> Jul=C2=A0 5 07:16:37 gw kernel: failed
> Jul=C2=A0 5 07:16:37 gw kernel: failed
> Jul=C2=A0 5 07:16:38 gw kernel: failed
> ```
>
>

I'm not sure but looking at the bhyve processes being killed, it remind= s
of an earlier issue that was solved with:

`vm.disable_swapspace_pageouts=3D1`

Cheers,
David

Hello David,

=
Thank you for this.

Let me enable this and mo= nitor.=C2=A0


--
Best regards,
Odhiambo WASHING= TON,
Nairobi,KE
+254 7 3200 0004/+254 7 2274 3223
=C2=A0In=C2=A0an Internet failure case, the #1 suspect is a constant: DNS.
"Oh, the cruft.",=C2=A0egrep -v '^$|^.*#'=C2=A0=C2=AF\_(=E3=83= =84)_/=C2=AF=C2=A0:-)
<= div>[How to ask smart questions:=C2=A0http://www.catb.org/~esr/faqs/sma= rt-questions.html]
--000000000000da8aba061c7c1f49--