From nobody Mon Aug 07 01:48:35 2023 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4RJzkQ5smmz4pnmj for ; Mon, 7 Aug 2023 01:48:46 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (tunnel82308-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "garrett.wollman.name", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4RJzkP5TVGz3f12 for ; Mon, 7 Aug 2023 01:48:45 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of wollman@hergotha.csail.mit.edu designates 2001:470:1f06:ccb::2 as permitted sender) smtp.mailfrom=wollman@hergotha.csail.mit.edu; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=bimajority.org (policy=none) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.17.1/8.17.1) with ESMTPS id 3771mas7002230 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sun, 6 Aug 2023 21:48:37 -0400 (EDT) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.17.1/8.17.1/Submit) id 3771ma5q002229; Sun, 6 Aug 2023 21:48:36 -0400 (EDT) (envelope-from wollman) List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <25808.19827.558916.702094@hergotha.csail.mit.edu> Date: Sun, 6 Aug 2023 21:48:35 -0400 From: Garrett Wollman To: freebsd-stable@freebsd.org Subject: EARLY_AP_STARTUP now (effectively) mandatory? X-Mailer: VM 8.2.0b under 28.2 (amd64-portbld-freebsd13.2) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.4 (hergotha.csail.mit.edu [0.0.0.0]); Sun, 06 Aug 2023 21:48:37 -0400 (EDT) X-Spam-Status: No, score=-0.8 required=5.0 tests=ALL_TRUSTED, HEADER_FROM_DIFFERENT_DOMAINS autolearn=disabled version=4.0.0 X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-14) on hergotha.csail.mit.edu X-Spamd-Result: default: False [-0.01 / 15.00]; SUBJECT_ENDS_QUESTION(1.00)[]; NEURAL_HAM_SHORT(-0.83)[-0.832]; NEURAL_HAM_LONG(-0.69)[-0.692]; NEURAL_SPAM_MEDIUM(0.42)[0.419]; FORGED_SENDER(0.30)[wollman@bimajority.org,wollman@hergotha.csail.mit.edu]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:1f06:ccb::2]; MIME_GOOD(-0.10)[text/plain]; DMARC_POLICY_SOFTFAIL(0.10)[bimajority.org : SPF not aligned (relaxed), No valid DKIM,none]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MLMMJ_DEST(0.00)[freebsd-stable@freebsd.org]; R_DKIM_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; FROM_NEQ_ENVFROM(0.00)[wollman@bimajority.org,wollman@hergotha.csail.mit.edu]; FREEFALL_USER(0.00)[wollman]; ARC_NA(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; FROM_HAS_DN(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TO_DN_NONE(0.00)[]; RCVD_COUNT_TWO(0.00)[2] X-Spamd-Bar: / X-Rspamd-Queue-Id: 4RJzkP5TVGz3f12 Hi, all, It's been a really long time since I've had much to say hereabouts, but as I'm in the middle of an upgrade cycle (12.4 to 13.2) I wanted to post about an issue I ran into. On both of my workstations, my custom kernel would hang at boot. I didn't see this on either of the servers that I had already upgraded. As I was bored at home today, I tried booting a GENERIC kernel, built from the same source tree (13.2-RELEASE-p1) as my custom kernel, and it booted just fine. I don't have the ability to do serial console on either of my workstations, nor any sort of network debugging, but when I did a verbose boot on the office workstation, it didn't show anything interesting. However, at home, I noticed that the hang occurred immediately after attach of: hwpstate_intel0: on cpu0 hwpstate_intel1: on cpu1 The first time I pressed a key on this machine's PS/2 keyboard, it got one step further: hwpstate_intel2: on cpu2 This is a 6-core, 12-thread system, and the working kernel gets all the way to hwpstate_intel11: on cpu11 nearly instantly. I took the working GENERIC configuration and pared it down to make a new custom kernel, and it worked (I'm using it right now). So I compared the working and broken configurations, and noticed the following options were present in the working configuration and not in the broken one: options EARLY_AP_STARTUP options GZIO options IICHID_SAMPLING options KDB options KDB_TRACE options NUMA options SCSI_DELAY=5000 options SC_PIXEL_MODE options VESA options ZSTDIO The first one, EARLY_AP_STARTUP, stood out to me as likely related to the problem -- most of the other options involve hardware or features that this machine doesn't use, but I could easily imagine that configuring power state controls on CPUs that haven't been started yet might fail. This option isn't mentioned anywhere in UPDATING, and the comment in GENERIC isn't espcially helpful, but I have a suspicion that this option is now effectively mandatory, at least if `cpufreq` is compiled into the kernel (as it is on all of my kernels and in GENERIC as well). To be 100% certain I should build the old config with just that option enabled, and maybe I'll try that on my work desktop since I still need to finish the upgrade there. This option was apparently added in 2016 by jhb@, and in his PHabricator description, he wrote: As a transition aid, the new behavior is moved under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I hope to enable this on x86 by default in a followup commit and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. Apparently we got all the way to 13.2 and this never happened. It should probably get at least a mention in UPDATING for anyone else who hasn't tripped over this. -GAWollman