From nobody Mon Nov 15 03:26:13 2021 X-Original-To: freebsd-fs@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4F8F21855B0C for ; Mon, 15 Nov 2021 03:26:34 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from relay.wiredblade.com (relay.wiredblade.com [168.235.95.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4Hsvk11ZSrz3qJ3; Mon, 15 Nov 2021 03:26:33 +0000 (UTC) (envelope-from cross+freebsd@distal.com) Received: from mail.distal.com (pool-108-48-165-176.washdc.fios.verizon.net [108.48.165.176]) by relay.wiredblade.com with ESMTPSA (version=TLSv1.2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256) ; Mon, 15 Nov 2021 03:26:26 +0000 Received: from smtpclient.apple ( [2001:420:c0c4:1004::33c]) by tristain.distal.com (OpenSMTPD) with ESMTPSA id d6f489ac (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Sun, 14 Nov 2021 22:26:24 -0500 (EST) Content-Type: text/plain; charset=utf-8 List-Id: Filesystems List-Archive: https://lists.freebsd.org/archives/freebsd-fs List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\)) Subject: Re: swap_pager: cannot allocate bio From: Chris Ross In-Reply-To: <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com> Date: Sun, 14 Nov 2021 22:26:13 -0500 Cc: freebsd-fs Content-Transfer-Encoding: quoted-printable Message-Id: <6DA63618-F0E9-48EC-AB57-3C3C102BC0C0@distal.com> References: <9FE99EEF-37C5-43D1-AC9D-17F3EDA19606@distal.com> <09989390-FED9-45A6-A866-4605D3766DFE@distal.com> <4E5511DF-B163-4928-9CC3-22755683999E@distal.com> <19A3AAF6-149B-4A3C-8C27-4CFF22382014@distal.com> To: Mark Johnston X-Mailer: Apple Mail (2.3693.20.0.1.32) X-Rspamd-Queue-Id: 4Hsvk11ZSrz3qJ3 X-Spamd-Bar: - Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of cross@distal.com designates 168.235.95.80 as permitted sender) smtp.mailfrom=cross@distal.com X-Spamd-Result: default: False [-1.18 / 15.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; ARC_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; FROM_HAS_DN(0.00)[]; MV_CASE(0.50)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; R_SPF_ALLOW(-0.20)[+a:relay.dynu.com]; DMARC_NA(0.00)[distal.com]; RCVD_COUNT_THREE(0.00)[3]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; TO_DN_ALL(0.00)[]; NEURAL_HAM_SHORT(-1.00)[-1.000]; RCPT_COUNT_TWO(0.00)[2]; NEURAL_SPAM_LONG(0.62)[0.621]; FROM_EQ_ENVFROM(0.00)[]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; ASN(0.00)[asn:3842, ipnet:168.235.92.0/22, country:US]; TAGGED_FROM(0.00)[freebsd]; RCVD_TLS_ALL(0.00)[]; RECEIVED_SPAMHAUS_PBL(0.00)[108.48.165.176:received] X-ThisMailContainsUnwantedMimeParts: N > On Nov 12, 2021, at 23:15, Chris Ross = wrote: >=20 > I=E2=80=99ve built a stable/13 as of today, and updated the system. = I=E2=80=99ll see > If the problem recurs, it usually takes about 24 hours to show. If > It does, I=E2=80=99ll see if I can run a procstat -kka and get it off = of the system. Happy Sunday, all. So, I logged in this evening 48 hours after starting = the job that uses a lot of CPU and I/O to the ZFS pool. The system seemed = to be working, and I was thought stable/13 just fixed it. But, after only = a few minutes of fooling around it started to show problems. Ssh connection hung, and new ones couldn=E2=80=99t be made. Then they could, but the = shell got stuck in disk wait once, and others worked. Very odd. I logged into = the console and ran a procstat -kka. Then, I tried to ls -f a directory in = the large ZFS fs (/tank), which hung. Ctrl-T on that shows: load: 0.04. cmd: ls 87050 [aw.aew_cv] 41.13r 0.00u 0.00s 0% 2632k mi_switch+0xc1 _cv_wait+0xf2 arc_wait_for_eviction+0x1df = arc_get_data_impl+0x85 arc_hdr_alloc_abd+0x7b arc_read+0x6f7 = dbuf_read+0xc5b dmu_buf_hold+0x46 zap_cursor_retrieve+0x163 = zfs_freebsd_readdir+0x393 VOP_READDIR_APV+0x1f kern_getdirentries+0x1d9 = sys_getdirentries+0x29 amd64_syscall+0x10c fast_syscall_common+0xf8 A procstat -kka output is available (208kb of text, 1441 lines) at https://pastebin.com/SvDcvRvb An ssh of a top command completed and shows: last pid: 91551; load averages: 0.00, 0.02, 0.30 up 2+00:19:33 = 22:23:15 40 processes: 1 running, 38 sleeping, 1 zombie CPU: 3.9% user, 0.0% nice, 0.9% system, 0.0% interrupt, 95.2% idle Mem: 58G Active, 210M Inact, 1989M Laundry, 52G Wired, 1427M Buf, 12G = Free ARC: 48G Total, 10G MFU, 38G MRU, 128K Anon, 106M Header, 23M Other 46G Compressed, 46G Uncompressed, 1.00:1 Ratio Swap: 425G Total, 3487M Used, 422G Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU = COMMAND 90996 root 1 22 0 21M 9368K select 22 0:00 0.10% = sshd 89398 cross 23 52 0 97G 60G uwait 4 94.1H 0.00% = python3. 55463 cross 18 20 0 301M 54M kqread 31 4:30 0.00% = python3. 54338 cross 4 20 0 82M 9632K kqread 33 1:02 0.00% = python3. 84083 ntpd 1 20 0 21M 1712K select 33 0:07 0.00% = ntpd I=E2=80=99d love to hear any thoughts. Again, this is running a = 13-stable stable/13-n248044-4a36455c417. Thanks all. - Chris