`sysctl vm.pmap.kernel_maps' spins on 12.2-RELEASE-p3 w/ nvdimm.ko

Pokala, Ravi rpokala at panasas.com
Thu Feb 25 02:06:09 UTC 2021



-----Original Message-----
From: Alan Somers <asomers at freebsd.org>
Date: 2021-02-24, Wednesday at 16:27
To: Konstantin Belousov <kostikbel at gmail.com>
Cc: Ravi Pokala <rpokala at freebsd.org>, "freebsd-hackers at freebsd.org" <freebsd-hackers at freebsd.org>
Subject: Re: `sysctl vm.pmap.kernel_maps' spins on 12.2-RELEASE-p3 w/ nvdimm.ko

    On Wed, Feb 24, 2021 at 5:26 PM Konstantin Belousov <kostikbel at gmail.com> wrote:


    On Wed, Feb 24, 2021 at 04:55:46PM -0700, Alan Somers wrote:
    > On Wed, Feb 24, 2021 at 4:49 PM Konstantin Belousov <kostikbel at gmail.com>
    > wrote:
    > 
    > > On Wed, Feb 24, 2021 at 03:37:12PM -0800, Ravi Pokala wrote:
    > > > Hi folks,
    > > >
    > > > A colleague and I both independently observed `sysctl -a' appear to hang
    > > on nodes running FreeBSD 12.2-RELEASE-p3; it didn't emit any output, and ^C
    > > didn't kill it. We could still establish a new terminal session to the
    > > node, via SSH or serial console, and we were able to see that it was
    > > actually spinning, not hung, and was consuming an entire CPU.
    > > >
    > > > We eventually determined that it was specifically `sysctl
    > > vm.pmap.kernel_maps' which was spinning, and subsequently that it only
    > > spinned if nvdimm.ko was loaded. It was not necessary to access the device
    > > node associated with the NVDIMM; merely having the module loaded was
    > > sufficient.
    > > >
    > > > I know nvdimm(4) isn't terribly widely used, but hopefully someone who
    > > uses it can at least confirm my findings on this. Help in debugging would
    > > be even more appreciated.
    > > >
    > >
    > > How large your nvdimms are?  Their' SPAs are mapped into KVA fully and this
    > > could be quite large.  It could be busy dumping page tables.

On these nodes, 16GB.

    > > Try to skip large map in pmap.c:sysctl_kmaps() (just increment i over it).

Thanks! This worked for me:

|  		case LMSPML4I:
| -			sbuf_printf(sb, "\nLarge map:\n");
| -			break;
| +			sbuf_printf(sb, "\nLarge map: SKIPPING\n");
| +			continue;
|  		}

    > Speaking of vm.pmap.kernel_maps, that thing is huge.  It easily dwarfs all
    > other sysctls combined, and tends to grow with time.  Would it be possible
    > to hide it from sysctl -a's output?  I think there are other sysctls like
    > that, that are treated as opaque binary values.  I once had to fix a bug in
    > py37-salt that caused a common operation to take _6_hours_ as opposed to <
    > 1 minute because of a huge vm.pmap.kernel_maps value, coupled with some
    > O(n^2) string processing.

With nvdimm.ko unloaded on these nodes, it's around 3000 lines! O_O

It occurred to me that it might eventually finish, so I started it last night, and it hadn't completed overnight.

    jhb already marked this sysctl ask CTLFLAG_SKIP, several months ago.
    The change was not merged back to 12.

Nice!

Found it: r368768 (1dce7d9e7eef)

That's probably a cleaner fix than the patch above; I'll see about applying that in my tree.

Thanks, all!

-Ravi (rpokala@)

    Ahh, good. 





More information about the freebsd-hackers mailing list