git: d31a21b939f7 - stable/13 - x86/mp: don't create empty cpu groups

From: Ed Maste <emaste_at_FreeBSD.org>
Date: Thu, 29 Sep 2022 20:20:01 UTC
The branch stable/13 has been updated by emaste:

URL: https://cgit.FreeBSD.org/src/commit/?id=d31a21b939f7b1380f3055374317b71a034976a4

commit d31a21b939f7b1380f3055374317b71a034976a4
Author:     Corvin Köhne <CorvinK@beckhoff.com>
AuthorDate: 2022-05-30 09:19:14 +0000
Commit:     Ed Maste <emaste@FreeBSD.org>
CommitDate: 2022-09-29 19:58:50 +0000

    x86/mp: don't create empty cpu groups
    
    When some APICs are disabled by tunables, some cpu groups could end up
    empty. An empty cpu group causes the system to panic because not all
    functions handle them correctly. Additionally, it's wasted time to
    handle and inspect empty cpu groups. Therefore, just don't create them.
    
    Reviewed by:    kib, avg, cem
    Sponsored by:   Beckhoff Automation GmbH & Co. KG
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D24927
    
    (cherry picked from commit 7468332f5518c1a725cd0067c35490f82ef781bd)
---
 sys/x86/x86/mp_x86.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/sys/x86/x86/mp_x86.c b/sys/x86/x86/mp_x86.c
index 289885fa6213..a7aa508f7699 100644
--- a/sys/x86/x86/mp_x86.c
+++ b/sys/x86/x86/mp_x86.c
@@ -871,6 +871,25 @@ x86topo_add_sched_group(struct topo_node *root, struct cpu_group *cg_root)
 	nchildren = 0;
 	node = root;
 	while (node != NULL) {
+		/*
+		 * When some APICs are disabled by tunables, nodes can end up
+		 * with an empty cpuset. Nodes with an empty cpuset will be
+		 * translated into cpu groups with empty cpusets. smp_topo_fill
+		 * will then set cg_first and cg_last to -1. This isn't
+		 * correctly handled in all functions. E.g. when
+		 * cpu_search_lowest and cpu_search_highest loop through all
+		 * cpus, they call CPU_ISSET on cpu -1 which ends up in a
+		 * general protection fault.
+		 *
+		 * We could fix the scheduler to handle empty cpu groups
+		 * correctly. Nevertheless, empty cpu groups are causing
+		 * overhead for no value. So, it makes more sense to just don't
+		 * create them.
+		 */
+		if (CPU_EMPTY(&node->cpuset)) {
+			node = topo_next_node(root, node);
+			continue;
+		}
 		if (CPU_CMP(&node->cpuset, &root->cpuset) == 0) {
 			if (node->type == TOPO_TYPE_CACHE &&
 			    cg_root->cg_level < node->subtype)
@@ -896,8 +915,14 @@ x86topo_add_sched_group(struct topo_node *root, struct cpu_group *cg_root)
 	if (nchildren == root->cpu_count)
 		return;
 
-	cg_root->cg_child = smp_topo_alloc(nchildren);
+	/*
+	 * We are not interested in nodes without children.
+	 */
 	cg_root->cg_children = nchildren;
+	if (nchildren == 0)
+		return;
+
+	cg_root->cg_child = smp_topo_alloc(nchildren);
 
 	/*
 	 * Now find again the same cache nodes as above and recursively
@@ -909,7 +934,8 @@ x86topo_add_sched_group(struct topo_node *root, struct cpu_group *cg_root)
 		if ((node->type != TOPO_TYPE_GROUP &&
 		    node->type != TOPO_TYPE_NODE &&
 		    node->type != TOPO_TYPE_CACHE) ||
-		    CPU_CMP(&node->cpuset, &root->cpuset) == 0) {
+		    CPU_CMP(&node->cpuset, &root->cpuset) == 0 ||
+		    CPU_EMPTY(&node->cpuset)) {
 			node = topo_next_node(root, node);
 			continue;
 		}