From nobody Tue Mar 21 17:54:12 2023 X-Original-To: hackers@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Pgzkm5jSNz40ZyQ for ; Tue, 21 Mar 2023 17:54:24 +0000 (UTC) (envelope-from ken@freebsd.org) Received: from mithlond.kdm.org (mithlond.kdm.org [96.89.93.250]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mithlond.kdm.org", Issuer "mithlond.kdm.org" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Pgzkm310lz3xVr for ; Tue, 21 Mar 2023 17:54:24 +0000 (UTC) (envelope-from ken@freebsd.org) Authentication-Results: mx1.freebsd.org; none Received: from smtpclient.apple (mbp2021.int.kdm.org [10.0.0.25]) (authenticated bits=0) by mithlond.kdm.org (8.15.2/8.14.9) with ESMTPSA id 32LHsM9M026077 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Tue, 21 Mar 2023 13:54:22 -0400 (EDT) (envelope-from ken@freebsd.org) From: Ken Merry Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_2710E057-8BAD-4F01-9DD6-48E4ECCB364F" List-Id: Technical discussions relating to FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-hackers List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-hackers@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.400.51.1.1\)) Subject: Re: Getting v_wire_count from a kernel core dump? Date: Tue, 21 Mar 2023 13:54:12 -0400 In-Reply-To: Cc: hackers@freebsd.org To: Konstantin Belousov References: <66742036-C8DF-4A13-9D4A-CDA71217E574@freebsd.org> X-Mailer: Apple Mail (2.3731.400.51.1.1) X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.4.3 (mithlond.kdm.org [96.89.93.250]); Tue, 21 Mar 2023 13:54:23 -0400 (EDT) X-Rspamd-Queue-Id: 4Pgzkm310lz3xVr X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:7922, ipnet:96.64.0.0/11, country:US] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N --Apple-Mail=_2710E057-8BAD-4F01-9DD6-48E4ECCB364F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Mar 21, 2023, at 11:37, Konstantin Belousov = wrote: >=20 > On Tue, Mar 21, 2023 at 11:11:30AM -0400, Ken Merry wrote: >> I have kernel core dumps from several machines out in the field = (customer sites) that were out of memory panics, and I=E2=80=99m trying = to figure out, from the kernel core dumps, whether we=E2=80=99re dealing = with a potential page leak. >>=20 >> For context, these machines are running stable/13 from April 2021, = but they do have the fix for this bug: >>=20 >> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507 >>=20 >> Which is this commit in stable/13: >>=20 >> = https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5dafb8daf98deab23fc968= 070bc695 >>=20 >> On a running system, I can get a rough idea whether there is a page = leak by looking at the VM system page counters: >>=20 >> # sysctl vm.stats |grep count >> vm.stats.vm.v_cache_count: 0 >> vm.stats.vm.v_user_wire_count: 0 >> vm.stats.vm.v_laundry_count: 991626 >> vm.stats.vm.v_inactive_count: 39733216 >> vm.stats.vm.v_active_count: 11821309 >> vm.stats.vm.v_wire_count: 11154113 >> vm.stats.vm.v_free_count: 1599981 >> vm.stats.vm.v_page_count: 65347213 >>=20 >> So the first 5 numbers add up to 65300245 in this case, for a = difference of 46968. =20 >>=20 >> Am I off base here as far as the various counts adding up to the page = count? (e.g. is the wire count just an additional attribute of a page = and not another separate state like active, inactive or laundry?) >>=20 >> Looking at the kernel core dump for one of the systems I see: >>=20 >> kgdb) print vm_cnt >> $1 =3D {v_swtch =3D 0xfffffe022158f2f8, v_trap =3D = 0xfffffe022158f2f0, >> v_syscall =3D 0xfffffe022158f2e8, v_intr =3D 0xfffffe022158f2e0, >> v_soft =3D 0xfffffe022158f2d8, v_vm_faults =3D 0xfffffe022158f2d0, >> v_io_faults =3D 0xfffffe022158f2c8, v_cow_faults =3D = 0xfffffe022158f2c0, >> v_cow_optim =3D 0xfffffe022158f2b8, v_zfod =3D 0xfffffe022158f2b0, >> v_ozfod =3D 0xfffffe022158f2a8, v_swapin =3D 0xfffffe022158f2a0, >> v_swapout =3D 0xfffffe022158f298, v_swappgsin =3D = 0xfffffe022158f290, >> v_swappgsout =3D 0xfffffe022158f288, v_vnodein =3D = 0xfffffe022158f280, >> v_vnodeout =3D 0xfffffe022158f278, v_vnodepgsin =3D = 0xfffffe022158f270, >> v_vnodepgsout =3D 0xfffffe022158f268, v_intrans =3D = 0xfffffe022158f260, >> v_reactivated =3D 0xfffffe022158f258, v_pdwakeups =3D = 0xfffffe022158f250, >> v_pdpages =3D 0xfffffe022158f248, v_pdshortfalls =3D = 0xfffffe022158f240, >> v_dfree =3D 0xfffffe022158f238, v_pfree =3D 0xfffffe022158f230, >> v_tfree =3D 0xfffffe022158f228, v_forks =3D 0xfffffe022158f220, >> v_vforks =3D 0xfffffe022158f218, v_rforks =3D 0xfffffe022158f210, >> v_kthreads =3D 0xfffffe022158f208, v_forkpages =3D = 0xfffffe022158f200, >> v_vforkpages =3D 0xfffffe022158f1f8, v_rforkpages =3D = 0xfffffe022158f1f0, >> v_kthreadpages =3D 0xfffffe022158f1e8, v_wire_count =3D = 0xfffffe022158f1e0, >> v_page_size =3D 4096, v_page_count =3D 65342843, v_free_reserved =3D = 85343, >> v_free_target =3D 1392195, v_free_min =3D 412056, v_inactive_target = =3D 2088292, >> v_pageout_free_min =3D 136, v_interrupt_free_min =3D 8, = v_free_severe =3D 248698} >> (kgdb) print vm_ndomains >> $2 =3D 4 >> (kgdb) print vm_dom[0].vmd_pagequeues[0].pq_cnt >> $3 =3D 6298704 >> (kgdb) print vm_dom[0].vmd_pagequeues[1].pq_cnt >> $4 =3D 3423939 >> (kgdb) print vm_dom[0].vmd_pagequeues[2].pq_cnt >> $5 =3D 629834 >> (kgdb) print vm_dom[0].vmd_pagequeues[3].pq_cnt >> $6 =3D 0 >> (kgdb) print vm_dom[1].vmd_pagequeues[0].pq_cnt >> $7 =3D 2301793 >> (kgdb) print vm_dom[1].vmd_pagequeues[1].pq_cnt >> $8 =3D 7130193 >> (kgdb) print vm_dom[1].vmd_pagequeues[2].pq_cnt >> $9 =3D 701495 >> (kgdb) print vm_dom[1].vmd_pagequeues[3].pq_cnt >> $10 =3D 0 >> (kgdb) print vm_dom[2].vmd_pagequeues[0].pq_cnt >> $11 =3D 464429 >> (kgdb) print vm_dom[2].vmd_pagequeues[1].pq_cnt >> $12 =3D 9123532 >> (kgdb) print vm_dom[2].vmd_pagequeues[2].pq_cnt >> $13 =3D 1037423 >> (kgdb) print vm_dom[2].vmd_pagequeues[3].pq_cnt >> $14 =3D 0 >> (kgdb) print vm_dom[3].vmd_pagequeues[0].pq_cnt >> $15 =3D 5444946 >> (kgdb) print vm_dom[3].vmd_pagequeues[1].pq_cnt >> $16 =3D 4466782 >> (kgdb) print vm_dom[3].vmd_pagequeues[2].pq_cnt >> $17 =3D 785195 >> (kgdb) print vm_dom[3].vmd_pagequeues[3].pq_cnt >> $18 =3D 0 >> (kgdb)=20 >>=20 >>=20 >> Adding up the page queue counts: >>=20 >> 6298704 >> 3423939 >> 629834 >> ++p >> 10352477 >> 2301793 >> 7130193 >> 701495 >> ++p >> 10133481 >> +p >> 20485958 >> 464429 >> 9123532 >> 1037423 >> ++p >> 10625384 >> +p >> 31111342 >> 5444946 >> 4466782 >> 785195 >> ++p >> 10696923 >> +p >> 41808265 >>=20 >> So, about 23M pages short of v_page_count. =20 >>=20 >> v_wire_count is a per-CPU counter, and on a running system it gets = added up. But trying to access it in the kernel core dump yields: >>=20 >> (kgdb) print vm_cnt.v_wire_count >> $2 =3D (counter_u64_t) 0xfffffe022158f1e0 >> (kgdb) print *$2 >> Cannot access memory at address 0xfffffe022158f1e0 >>=20 >> Anyone have any ideas whether I can figure out whether there is a = page leak from the core dump? >>=20 >=20 > Did you looked at UMA/malloc stats? It could be genuine VM leaking = pages, > but more often it is some kernel subsystem leaking its own = allocations. > For later, try both vmstat -z and vmstat -m on the kernel.debug + = vmcore. > Often the leakage is immediately obvious. So, vmstat -m doesn=E2=80=99t work on a kernel core dump, at least on = this vintage of stable/13: # vmstat -m -N /usr/lib/debug/boot/kernel/kernel.debug -M = 2023-03-15_21-23-27.vmcore.0 vmstat: memstat_kvm_malloc:=20 Type InUse MemUse Requests Size(s) As for vmstat -z, we=E2=80=99ve got a script that adds it all up, and = it=E2=80=99s ~80GB on a system with 256GB RAM. The largest processes in the system add up to approximately 24GB of RAM = used, although that is very difficult to measure precisely because = we=E2=80=99re using Postgres, and the processes that Postgres uses share = varying amounts of RAM. =20 But it doesn=E2=80=99t seem like we=E2=80=99re approaching 256GB. If I = can rule out a page leak, then the reason behind this change could be = the issue: = https://cgit.freebsd.org/src/commit/sys/vm?h=3Dstable/13&id=3D555baef969a1= 7a7cbcd6af3ee5bcf854ecd4de7c The ARC in our case is at about 65GB. Thanks, Ken =E2=80=94=20 Ken Merry ken@FreeBSD.ORG --Apple-Mail=_2710E057-8BAD-4F01-9DD6-48E4ECCB364F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On = Mar 21, 2023, at 11:37, Konstantin Belousov <kostikbel@gmail.com> = wrote:

On Tue, Mar 21, 2023 at = 11:11:30AM -0400, Ken Merry wrote:
I have kernel = core dumps from several machines out in the field (customer sites) that = were out of memory panics, and I=E2=80=99m trying to figure out, from = the kernel core dumps, whether we=E2=80=99re dealing with a potential = page leak.

For context, these machines are running stable/13 from = April 2021, but they do have the fix for this = bug:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D256507
=
Which is this commit in = stable/13:

https://cgit.freebsd.org/src/commit/?id=3D6094749a1a5daf= b8daf98deab23fc968070bc695

On a running system, I can get a rough = idea whether there is a page leak by looking at the VM system page = counters:

# sysctl vm.stats |grep = count
vm.stats.vm.v_cache_count: 0
vm.stats.vm.v_user_wire_count: = 0
vm.stats.vm.v_laundry_count: = 991626
vm.stats.vm.v_inactive_count: = 39733216
vm.stats.vm.v_active_count: = 11821309
vm.stats.vm.v_wire_count: = 11154113
vm.stats.vm.v_free_count: = 1599981
vm.stats.vm.v_page_count: 65347213

So the first 5 = numbers add up to 65300245 in this case, for a difference of 46968. =  

Am I off base here as far as the various counts adding up = to the page count?  (e.g. is the wire count just an additional = attribute of a page and not another separate state like active, inactive = or laundry?)

Looking at the kernel core dump for one of the = systems I see:

kgdb) print vm_cnt
$1 =3D {v_swtch =3D = 0xfffffe022158f2f8, v_trap =3D 0xfffffe022158f2f0,
 v_syscall =3D = 0xfffffe022158f2e8, v_intr =3D 0xfffffe022158f2e0,
 v_soft =3D = 0xfffffe022158f2d8, v_vm_faults =3D = 0xfffffe022158f2d0,
 v_io_faults =3D 0xfffffe022158f2c8, = v_cow_faults =3D 0xfffffe022158f2c0,
 v_cow_optim =3D = 0xfffffe022158f2b8, v_zfod =3D 0xfffffe022158f2b0,
 v_ozfod =3D = 0xfffffe022158f2a8, v_swapin =3D 0xfffffe022158f2a0,
 v_swapout = =3D 0xfffffe022158f298, v_swappgsin =3D = 0xfffffe022158f290,
 v_swappgsout =3D 0xfffffe022158f288, = v_vnodein =3D 0xfffffe022158f280,
 v_vnodeout =3D = 0xfffffe022158f278, v_vnodepgsin =3D = 0xfffffe022158f270,
 v_vnodepgsout =3D 0xfffffe022158f268, = v_intrans =3D 0xfffffe022158f260,
 v_reactivated =3D = 0xfffffe022158f258, v_pdwakeups =3D = 0xfffffe022158f250,
 v_pdpages =3D 0xfffffe022158f248, = v_pdshortfalls =3D 0xfffffe022158f240,
 v_dfree =3D = 0xfffffe022158f238, v_pfree =3D 0xfffffe022158f230,
 v_tfree =3D = 0xfffffe022158f228, v_forks =3D 0xfffffe022158f220,
 v_vforks =3D = 0xfffffe022158f218, v_rforks =3D 0xfffffe022158f210,
 v_kthreads = =3D 0xfffffe022158f208, v_forkpages =3D = 0xfffffe022158f200,
 v_vforkpages =3D 0xfffffe022158f1f8, = v_rforkpages =3D 0xfffffe022158f1f0,
 v_kthreadpages =3D = 0xfffffe022158f1e8, v_wire_count =3D = 0xfffffe022158f1e0,
 v_page_size =3D 4096, v_page_count =3D = 65342843, v_free_reserved =3D 85343,
 v_free_target =3D 1392195, = v_free_min =3D 412056, v_inactive_target =3D = 2088292,
 v_pageout_free_min =3D 136, v_interrupt_free_min =3D = 8, v_free_severe =3D 248698}
(kgdb) print vm_ndomains
$2 =3D = 4
(kgdb) print vm_dom[0].vmd_pagequeues[0].pq_cnt
$3 =3D = 6298704
(kgdb) print vm_dom[0].vmd_pagequeues[1].pq_cnt
$4 =3D = 3423939
(kgdb) print vm_dom[0].vmd_pagequeues[2].pq_cnt
$5 =3D = 629834
(kgdb) print vm_dom[0].vmd_pagequeues[3].pq_cnt
$6 =3D = 0
(kgdb) print vm_dom[1].vmd_pagequeues[0].pq_cnt
$7 =3D = 2301793
(kgdb) print vm_dom[1].vmd_pagequeues[1].pq_cnt
$8 =3D = 7130193
(kgdb) print vm_dom[1].vmd_pagequeues[2].pq_cnt
$9 =3D = 701495
(kgdb) print vm_dom[1].vmd_pagequeues[3].pq_cnt
$10 =3D = 0
(kgdb) print vm_dom[2].vmd_pagequeues[0].pq_cnt
$11 =3D = 464429
(kgdb) print vm_dom[2].vmd_pagequeues[1].pq_cnt
$12 =3D = 9123532
(kgdb) print vm_dom[2].vmd_pagequeues[2].pq_cnt
$13 =3D = 1037423
(kgdb) print vm_dom[2].vmd_pagequeues[3].pq_cnt
$14 =3D = 0
(kgdb) print vm_dom[3].vmd_pagequeues[0].pq_cnt
$15 =3D = 5444946
(kgdb) print vm_dom[3].vmd_pagequeues[1].pq_cnt
$16 =3D = 4466782
(kgdb) print vm_dom[3].vmd_pagequeues[2].pq_cnt
$17 =3D = 785195
(kgdb) print vm_dom[3].vmd_pagequeues[3].pq_cnt
$18 =3D = 0
(kgdb) 


Adding up the = page queue = counts:

6298704
3423939
629834
++p
10352477
2301793<= br>7130193
701495
++p
10133481
+p
20485958
464429
912= 3532
1037423
++p
10625384
+p
31111342
5444946
4466782=
785195
++p
10696923
+p
41808265

So, about 23M = pages short of v_page_count.  

v_wire_count is a per-CPU = counter, and on a running system it gets added up.  But trying to = access it in the kernel core dump yields:

(kgdb) print = vm_cnt.v_wire_count
$2 =3D (counter_u64_t) = 0xfffffe022158f1e0
(kgdb) print *$2
Cannot access memory at = address 0xfffffe022158f1e0

Anyone have any ideas whether I can = figure out whether there is a page leak from the core = dump?


Did you = looked at UMA/malloc stats?  It could be genuine VM leaking = pages,
but more often it is = some kernel subsystem leaking its own allocations.
For later, try both = vmstat -z and vmstat -m on the kernel.debug + vmcore.
Often the leakage is = immediately obvious.

So, vmstat = -m doesn=E2=80=99t work on a kernel core dump, at least on this vintage = of stable/13:

# vmstat -m -N = /usr/lib/debug/boot/kernel/kernel.debug -M = 2023-03-15_21-23-27.vmcore.0
vmstat: = memstat_kvm_malloc: 
        =  Type InUse MemUse Requests =  Size(s)

As for vmstat -z, we=E2=80=99ve = got a script that adds it all up, and it=E2=80=99s ~80GB on a system = with 256GB RAM.

The largest processes in the = system add up to approximately 24GB of RAM used, although that is very = difficult to measure precisely because we=E2=80=99re using Postgres, and = the processes that Postgres uses share varying amounts of RAM. =  

But it doesn=E2=80=99t seem like we=E2=80=99= re approaching 256GB.  If I can rule out a page leak, then the = reason behind this change could be the = issue:


The ARC in our case is at about = 65GB.

Thanks,

Ken
<= /div>
=E2=80=94 
Ken = Merry
ken@FreeBSD.ORG

= --Apple-Mail=_2710E057-8BAD-4F01-9DD6-48E4ECCB364F--