From nobody Wed Dec 15 13:38:05 2021 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id BA25018E055A; Wed, 15 Dec 2021 13:38:05 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JDbsn2zXzz3m4B; Wed, 15 Dec 2021 13:38:05 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 41BA91D183; Wed, 15 Dec 2021 13:38:05 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 1BFDc5Jn070567; Wed, 15 Dec 2021 13:38:05 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 1BFDc5SR070566; Wed, 15 Dec 2021 13:38:05 GMT (envelope-from git) Date: Wed, 15 Dec 2021 13:38:05 GMT Message-Id: <202112151338.1BFDc5SR070566@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-branches@FreeBSD.org From: Mark Johnston Subject: git: 55e020a6f9fc - stable/13 - amd64: Reduce the amount of cpuset copying done for TLB shootdowns List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/stable/13 X-Git-Reftype: branch X-Git-Commit: 55e020a6f9fcbe80bfd8f532cae9c22cf4ebd74b Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1639575485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=RT8qkSOePQ3pSaXu4WvRW8vJk3hJ+TlJy1eT/bH+3WY=; b=cuA3XGdf/6WOnu1H9bonHGdb6PlQqm/FHjWZDJTJo1ZlAd5ShXhQDUwoSVTNquUG6amDj+ WCEjws0tZ53YIbc/rs7E/gvSUalqdYwx8R7fDrKnRRT9KiJp1aBUgrjjAPoI2pxUDMD/PB gAjXuDfymZQYQj0mPJlEDN0DWBFep+4KTCJVYctF3q0GzyGZzk6z0EB7LNXfJv3vbTkvCD +oU/tgwEVtWqaOG8msYhz7vs3qHEB+DZ7msxapp/rffPoCdWkpTu8UyFQGqgSfYP0GXpBi V5bmfy6MVr4uqqBzoYDoga/jgkKZGJdxSJKqQ6MofrWRv+UR+olQ2tAwFneUHQ== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1639575485; a=rsa-sha256; cv=none; b=ThytwCd0pYfvEyVl2+JMwYwqY5GGcb7St8MxiiReEtO3Mbl7ORSKUejMGYwsXUyMMoGRXR Uhz+9iV3lE3Os9emuPY6T5wgxbmVEOQ6JpYEQnEnl/ueT+BNsUYDy+krulCXfPLOdbbjyF lWTEqinDVGYyVEGYwWh7EB4akskaVyDjMP6gVJFbbCdddHznZfGU5tBUXWTliO/kQOastG TsEefacQH1QH+87m8AKxVQDxGcEGC6SEUqIpxUw6A3PZusWG8NXlxX8/1MduaF7CLPAV6k BFN7gD8+jyy4uI7Rj0ccwFj1mUzVstiM9BlZDQ1IhiqaNiOltEnl6RE/DtWwJA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch stable/13 has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=55e020a6f9fcbe80bfd8f532cae9c22cf4ebd74b commit 55e020a6f9fcbe80bfd8f532cae9c22cf4ebd74b Author: Mark Johnston AuthorDate: 2021-11-15 17:52:03 +0000 Commit: Mark Johnston CommitDate: 2021-12-15 13:31:48 +0000 amd64: Reduce the amount of cpuset copying done for TLB shootdowns We use pmap_invalidate_cpu_mask() to get the set of active CPUs. This (32-byte) set is copied by value through multiple frames until we get to smp_targeted_tlb_shootdown(), where it is copied yet again. Avoid this copying by having smp_targeted_tlb_shootdown() make a local copy of the active CPUs for the pmap, and drop the cpuset parameter, simplifying callers. Also leverage the use of the non-destructive CPU_FOREACH_ISSET to avoid unneeded copying within smp_targeted_tlb_shootdown(). Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation (cherry picked from commit ab12e8db292c386a33445dcd95fa629413954192) --- sys/amd64/amd64/mp_machdep.c | 39 ++++++++++++++++++--------------------- sys/amd64/amd64/pmap.c | 8 +++----- sys/amd64/include/pmap.h | 9 +++++++-- sys/x86/include/x86_smp.h | 15 ++++++++++++--- 4 files changed, 40 insertions(+), 31 deletions(-) diff --git a/sys/amd64/amd64/mp_machdep.c b/sys/amd64/amd64/mp_machdep.c index cbc4b33841ba..58f4a539c481 100644 --- a/sys/amd64/amd64/mp_machdep.c +++ b/sys/amd64/amd64/mp_machdep.c @@ -613,10 +613,10 @@ invl_scoreboard_slot(u_int cpu) * completion. */ static void -smp_targeted_tlb_shootdown(cpuset_t mask, pmap_t pmap, vm_offset_t addr1, - vm_offset_t addr2, smp_invl_cb_t curcpu_cb, enum invl_op_codes op) +smp_targeted_tlb_shootdown(pmap_t pmap, vm_offset_t addr1, vm_offset_t addr2, + smp_invl_cb_t curcpu_cb, enum invl_op_codes op) { - cpuset_t other_cpus; + cpuset_t mask; uint32_t generation, *p_cpudone; int cpu; bool is_all; @@ -631,10 +631,12 @@ smp_targeted_tlb_shootdown(cpuset_t mask, pmap_t pmap, vm_offset_t addr1, KASSERT(curthread->td_pinned > 0, ("curthread not pinned")); /* - * Check for other cpus. Return if none. + * Make a stable copy of the set of CPUs on which the pmap is active. + * See if we have to interrupt other CPUs. */ - is_all = !CPU_CMP(&mask, &all_cpus); - CPU_CLR(PCPU_GET(cpuid), &mask); + CPU_COPY(pmap_invalidate_cpu_mask(pmap), &mask); + is_all = CPU_CMP(&mask, &all_cpus) == 0; + CPU_CLR(curcpu, &mask); if (CPU_EMPTY(&mask)) goto local_cb; @@ -663,7 +665,7 @@ smp_targeted_tlb_shootdown(cpuset_t mask, pmap_t pmap, vm_offset_t addr1, CPU_FOREACH_ISSET(cpu, &mask) { KASSERT(*invl_scoreboard_slot(cpu) != 0, ("IPI scoreboard is zero, initiator %d target %d", - PCPU_GET(cpuid), cpu)); + curcpu, cpu)); *invl_scoreboard_slot(cpu) = 0; } @@ -674,14 +676,11 @@ smp_targeted_tlb_shootdown(cpuset_t mask, pmap_t pmap, vm_offset_t addr1, */ if (is_all) { ipi_all_but_self(IPI_INVLOP); - other_cpus = all_cpus; - CPU_CLR(PCPU_GET(cpuid), &other_cpus); } else { - other_cpus = mask; ipi_selected(mask, IPI_INVLOP); } curcpu_cb(pmap, addr1, addr2); - CPU_FOREACH_ISSET(cpu, &other_cpus) { + CPU_FOREACH_ISSET(cpu, &mask) { p_cpudone = invl_scoreboard_slot(cpu); while (atomic_load_int(p_cpudone) != generation) ia32_pause(); @@ -705,29 +704,28 @@ local_cb: } void -smp_masked_invltlb(cpuset_t mask, pmap_t pmap, smp_invl_cb_t curcpu_cb) +smp_masked_invltlb(pmap_t pmap, smp_invl_cb_t curcpu_cb) { - smp_targeted_tlb_shootdown(mask, pmap, 0, 0, curcpu_cb, invl_op_tlb); + smp_targeted_tlb_shootdown(pmap, 0, 0, curcpu_cb, invl_op_tlb); #ifdef COUNT_XINVLTLB_HITS ipi_global++; #endif } void -smp_masked_invlpg(cpuset_t mask, vm_offset_t addr, pmap_t pmap, - smp_invl_cb_t curcpu_cb) +smp_masked_invlpg(vm_offset_t addr, pmap_t pmap, smp_invl_cb_t curcpu_cb) { - smp_targeted_tlb_shootdown(mask, pmap, addr, 0, curcpu_cb, invl_op_pg); + smp_targeted_tlb_shootdown(pmap, addr, 0, curcpu_cb, invl_op_pg); #ifdef COUNT_XINVLTLB_HITS ipi_page++; #endif } void -smp_masked_invlpg_range(cpuset_t mask, vm_offset_t addr1, vm_offset_t addr2, - pmap_t pmap, smp_invl_cb_t curcpu_cb) +smp_masked_invlpg_range(vm_offset_t addr1, vm_offset_t addr2, pmap_t pmap, + smp_invl_cb_t curcpu_cb) { - smp_targeted_tlb_shootdown(mask, pmap, addr1, addr2, curcpu_cb, + smp_targeted_tlb_shootdown(pmap, addr1, addr2, curcpu_cb, invl_op_pgrng); #ifdef COUNT_XINVLTLB_HITS ipi_range++; @@ -738,8 +736,7 @@ smp_masked_invlpg_range(cpuset_t mask, vm_offset_t addr1, vm_offset_t addr2, void smp_cache_flush(smp_invl_cb_t curcpu_cb) { - smp_targeted_tlb_shootdown(all_cpus, NULL, 0, 0, curcpu_cb, - INVL_OP_CACHE); + smp_targeted_tlb_shootdown(kernel_pmap, 0, 0, curcpu_cb, INVL_OP_CACHE); } /* diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c index 986a1b670d60..4325acd1255a 100644 --- a/sys/amd64/amd64/pmap.c +++ b/sys/amd64/amd64/pmap.c @@ -3104,8 +3104,7 @@ pmap_invalidate_page(pmap_t pmap, vm_offset_t va) ("pmap_invalidate_page: invalid type %d", pmap->pm_type)); pmap_invalidate_preipi(pmap); - smp_masked_invlpg(pmap_invalidate_cpu_mask(pmap), va, pmap, - pmap_invalidate_page_curcpu_cb); + smp_masked_invlpg(va, pmap, pmap_invalidate_page_curcpu_cb); } /* 4k PTEs -- Chosen to exceed the total size of Broadwell L2 TLB */ @@ -3203,7 +3202,7 @@ pmap_invalidate_range(pmap_t pmap, vm_offset_t sva, vm_offset_t eva) ("pmap_invalidate_range: invalid type %d", pmap->pm_type)); pmap_invalidate_preipi(pmap); - smp_masked_invlpg_range(pmap_invalidate_cpu_mask(pmap), sva, eva, pmap, + smp_masked_invlpg_range(sva, eva, pmap, pmap_invalidate_range_curcpu_cb); } @@ -3289,8 +3288,7 @@ pmap_invalidate_all(pmap_t pmap) ("pmap_invalidate_all: invalid type %d", pmap->pm_type)); pmap_invalidate_preipi(pmap); - smp_masked_invltlb(pmap_invalidate_cpu_mask(pmap), pmap, - pmap_invalidate_all_curcpu_cb); + smp_masked_invltlb(pmap, pmap_invalidate_all_curcpu_cb); } static void diff --git a/sys/amd64/include/pmap.h b/sys/amd64/include/pmap.h index 14ff4cf3cde9..8f1e77806a25 100644 --- a/sys/amd64/include/pmap.h +++ b/sys/amd64/include/pmap.h @@ -535,10 +535,15 @@ void pmap_kasan_enter(vm_offset_t); void pmap_kmsan_enter(vm_offset_t); #endif -static __inline cpuset_t +/* + * Returns a pointer to a set of CPUs on which the pmap is currently active. + * Note that the set can be modified without any mutual exclusion, so a copy + * must be made if a stable value is required. + */ +static __inline volatile cpuset_t * pmap_invalidate_cpu_mask(pmap_t pmap) { - return (pmap->pm_active); + return (&pmap->pm_active); } #endif /* _KERNEL */ diff --git a/sys/x86/include/x86_smp.h b/sys/x86/include/x86_smp.h index b9a1febb70f2..2cf0ff97eae0 100644 --- a/sys/x86/include/x86_smp.h +++ b/sys/x86/include/x86_smp.h @@ -107,14 +107,23 @@ void ipi_swi_handler(struct trapframe frame); void ipi_selected(cpuset_t cpus, u_int ipi); void ipi_self_from_nmi(u_int vector); void set_interrupt_apic_ids(void); +void mem_range_AP_init(void); +void topo_probe(void); + +/* functions in mp_machdep.c */ void smp_cache_flush(smp_invl_cb_t curcpu_cb); +#ifdef __i386__ void smp_masked_invlpg(cpuset_t mask, vm_offset_t addr, struct pmap *pmap, smp_invl_cb_t curcpu_cb); void smp_masked_invlpg_range(cpuset_t mask, vm_offset_t startva, vm_offset_t endva, struct pmap *pmap, smp_invl_cb_t curcpu_cb); void smp_masked_invltlb(cpuset_t mask, struct pmap *pmap, smp_invl_cb_t curcpu_cb); -void mem_range_AP_init(void); -void topo_probe(void); - +#else +void smp_masked_invlpg(vm_offset_t addr, struct pmap *pmap, + smp_invl_cb_t curcpu_cb); +void smp_masked_invlpg_range(vm_offset_t startva, vm_offset_t endva, + struct pmap *pmap, smp_invl_cb_t curcpu_cb); +void smp_masked_invltlb(struct pmap *pmap, smp_invl_cb_t curcpu_cb); +#endif #endif