EARLY_AP_STARTUP now (effectively) mandatory?
- Reply: Peter 'PMc' Much: "Re: EARLY_AP_STARTUP now (effectively) mandatory?"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 07 Aug 2023 01:48:35 UTC
Hi, all, It's been a really long time since I've had much to say hereabouts, but as I'm in the middle of an upgrade cycle (12.4 to 13.2) I wanted to post about an issue I ran into. On both of my workstations, my custom kernel would hang at boot. I didn't see this on either of the servers that I had already upgraded. As I was bored at home today, I tried booting a GENERIC kernel, built from the same source tree (13.2-RELEASE-p1) as my custom kernel, and it booted just fine. I don't have the ability to do serial console on either of my workstations, nor any sort of network debugging, but when I did a verbose boot on the office workstation, it didn't show anything interesting. However, at home, I noticed that the hang occurred immediately after attach of: hwpstate_intel0: <Intel Speed Shift> on cpu0 hwpstate_intel1: <Intel Speed Shift> on cpu1 The first time I pressed a key on this machine's PS/2 keyboard, it got one step further: hwpstate_intel2: <Intel Speed Shift> on cpu2 This is a 6-core, 12-thread system, and the working kernel gets all the way to hwpstate_intel11: <Intel Speed Shift> on cpu11 nearly instantly. I took the working GENERIC configuration and pared it down to make a new custom kernel, and it worked (I'm using it right now). So I compared the working and broken configurations, and noticed the following options were present in the working configuration and not in the broken one: options EARLY_AP_STARTUP options GZIO options IICHID_SAMPLING options KDB options KDB_TRACE options NUMA options SCSI_DELAY=5000 options SC_PIXEL_MODE options VESA options ZSTDIO The first one, EARLY_AP_STARTUP, stood out to me as likely related to the problem -- most of the other options involve hardware or features that this machine doesn't use, but I could easily imagine that configuring power state controls on CPUs that haven't been started yet might fail. This option isn't mentioned anywhere in UPDATING, and the comment in GENERIC isn't espcially helpful, but I have a suspicion that this option is now effectively mandatory, at least if `cpufreq` is compiled into the kernel (as it is on all of my kernels and in GENERIC as well). To be 100% certain I should build the old config with just that option enabled, and maybe I'll try that on my work desktop since I still need to finish the upgrade there. This option was apparently added in 2016 by jhb@, and in his PHabricator description, he wrote: As a transition aid, the new behavior is moved under a new kernel option (EARLY_AP_STARTUP). This will allow the option to be turned off if need be during initial testing. I hope to enable this on x86 by default in a followup commit and to have all platforms moved over before 11.0. Once the transition is complete, the option will be removed along with the !EARLY_AP_STARTUP code. Apparently we got all the way to 13.2 and this never happened. It should probably get at least a mention in UPDATING for anyone else who hasn't tripped over this. -GAWollman