From nobody Fri Jul 05 08:27:32 2024 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4WFmq33RDbz5QrVk for ; Fri, 05 Jul 2024 08:27:43 +0000 (UTC) (envelope-from david.palma@takinobori.com) Received: from messages.takinobori.com (messages.takinobori.com [116.203.199.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "messages.takinobori.com", Issuer "R11" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4WFmq317KQz4hCC for ; Fri, 5 Jul 2024 08:27:43 +0000 (UTC) (envelope-from david.palma@takinobori.com) Authentication-Results: mx1.freebsd.org; none Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id B576EFC7; Fri, 5 Jul 2024 08:27:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=takinobori.com; s=dkim; t=1720168054; h=from:subject:date:message-id:to:mime-version:content-type: content-transfer-encoding:content-language:in-reply-to:references; bh=8oM9T3rR804pYIS06HSk8Yyhf0TCpHyi/sblZAwacms=; b=wUGX0qsBs33QuqiVAXhhwbQ4tzMLHrsrRcQ72ZMl/jDVup/pMzQBJVmBTFpse8SlEgNGTB 5Wduii8gg1vXJTW3LHxQFeZJf98uADiUR6QV4j3s/gZZhzpQTwiC7WYT528ySxkfYtun8v ScdAiQVMVBW0qERW6M2x/IBJid7X1FYTLRveO7neWcGdSO3gYl74ldSkP3ZHk4ZOOBBfq2 couu84SBAjby3FdGUvVyEVv7ve8G6RvBR/eB5TsM5V3MZo4mox03f4lSZftloMdJ0s02ko jMLDofIz3tTEZ1fRK2Uc2yg8TMWAfkgCTKzULa+Wpv8Ad6t8lmkrcMVl2L4oMw== Message-ID: <8d2a864b-a2ad-48b7-9c52-32b2af3ceb79@takinobori.com> Date: Fri, 5 Jul 2024 08:27:32 +0000 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-questions@freebsd.org Sender: owner-freebsd-questions@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Server became inaccessible because it ran out of swap space To: Odhiambo Washington , questions References: Content-Language: pt-PT From: David Palma In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Last-TLS-Session-Version: TLSv1.3 X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:24940, ipnet:116.203.0.0/16, country:DE] X-Rspamd-Queue-Id: 4WFmq317KQz4hCC Hi, On 05/07/2024 07:56, Odhiambo Washington wrote: > I have a server with 64GB RAM, 2CPUs each with 16 cores. I have also > configured 13GB or swap space. > > ``` > root@gw:/usr/local/bhyve-vms/scripts # swapinfo > Device 1K-blocks Used Avail Capacity > /dev/ada0p3 3163136 703316 2459820 22% > /dev/md0.eli 10485760 709352 9776408 7% > Total 13648896 1412668 12236228 10% > root@gw:/usr/local/bhyve-vms/scripts # > ``` > > A number of times it has become inaccessible until I do a hard reboot and > this has been caused by what I believe is running out of swap. > > Below is what I have obtained from /var/log/messages after I rebooted. > > How do I identify the culprit? Arrest the situation? > > > ``` > Jul 5 06:50:56 gw kernel: failed > Jul 5 06:52:11 gw kernel: failed > Jul 5 06:52:11 gw kernel: out of swap space > Jul 5 06:52:11 gw kernel: failed > Jul 5 06:52:11 gw kernel: failed > Jul 5 06:52:12 gw kernel: failed > Jul 5 06:52:12 gw kernel: failed > Jul 5 06:54:06 gw kernel: out of swap space > Jul 5 06:54:06 gw kernel: failed > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: pid 4076 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: tap4: link state changed to DOWN > Jul 5 07:16:30 gw kernel: out of swap space > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: pid 20849 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: tap5: link state changed to DOWN > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: sonewconn: pcb 0xfffff8002866d100 > (local:/var/run/wsgi.38620.0.1.sock): Listen queue overflow: 151 already in > queue awaiting acceptance (1 occurrences), euid 0, rgid 0, jail 0 > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: pid 3591 (bhyve), jid 0, uid 0, was killed: > failed to reclaim memory > Jul 5 07:16:30 gw kernel: tap3: link state changed to DOWN > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:30 gw kernel: out of swap space > Jul 5 07:16:30 gw kernel: failed > Jul 5 07:16:31 gw kernel: failed > Jul 5 07:16:31 gw kernel: failed > Jul 5 07:16:32 gw kernel: out of swap space > Jul 5 07:16:33 gw kernel: out of swap space > Jul 5 07:16:33 gw kernel: failed > Jul 5 07:16:33 gw kernel: failed > Jul 5 07:16:34 gw kernel: out of swap space > Jul 5 07:16:34 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:36 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:37 gw kernel: failed > Jul 5 07:16:38 gw kernel: failed > ``` > > I'm not sure but looking at the bhyve processes being killed, it reminds of an earlier issue that was solved with: `vm.disable_swapspace_pageouts=1` Cheers, David