Starting APs earlier during boot
Konstantin Belousov
kostikbel at gmail.com
Wed Feb 17 09:42:51 UTC 2016
On Tue, Feb 16, 2016 at 12:50:22PM -0800, John Baldwin wrote:
> Currently the kernel bootstraps the non-boot processors fairly early in the
> SI_SUB_CPU SYSINIT. The APs then spin waiting to be "released". We currently
> release the APs as one of the last steps at SI_SUB_SMP. On the one hand this
> removes much of the need for synchronization while SYSINITs are running since
> SYSINITs basically assume they are single-threaded. However, it also enforces
> some odd quirks. Several places that deal with per-CPU resources have to
> split initialization up so that the BSP init happens in one SYSINIT and the
> initialization of the APs happens in a second SYSINIT at SI_SUB_SMP.
>
> Another issue that is becoming more prominent on x86 (and probably will also
> affect other platforms if it isn't already) is that to support working
> interrupts for interrupt config hooks we bind all interrupts to the BSP during
> boot and only distribute them among other CPUs near the end at SI_SUB_SMP.
> This is especially problematic with drivers for modern hardware allocating
> num(CPUs) interrupts (hoping to use one per CPU). On x86 we have aboug 190
> IDT vectors available for device interrupts, so in theory we should be able to
> tolerate a lot of drivers doing this (e.g. 60 drivers could allocate 3
> interrupts for every CPU and we should still be fine). However, if you have,
> say, 32 cores in a system, then you can only handle about 5 drivers doing
> this before you run out of vectors on CPU 0.
>
> Longer term we would also like to eventually have most drivers attach in the
> same environment during boot as during post-boot. Right now post-boot is
> quite different as all CPUs are running, interrupts work, etc. One of the
> goals of multipass support for new-bus is to help us get there by probing
> enough hardware to get timers working and starting the scheduler before
> probing the rest of the devices. That goal isn't quite realized yet.
>
> However, we can run a slightly simpler version of our scheduler before
> timers are working. In fact, sleep/wakeup work just fine fairly early (we
> allocate the necessary structures at SI_SUB_KMEM which is before the APs
> are even started). Once idle threads are created and ready we could in
> theory let the APs startup and run other threads. You just don't have working
> timeouts. OTOH, you can sort of simulate timeouts if you modify the scheduler
> to yield the CPU instead of blocking the thread for a sleep with a timeout.
> The effect would be for threads that do sleeps with a timeout to fall back to
> polling before timers are working. In practice, all of the early kernel
> threads use sleeps without timeouts when idle so this doesn't really matter.
I understand that timeouts can be somewhat simulated this way.
But I do not quite understand how generic scheduling can work without
(timer) interrupts. Suppose that we have two threads 1 and 2 of the same
priority, both runnable, and due to some event thread 2 preempted thread
1. If thread 2 just runs without calling the preempt functions like
msleep, what would guarentee that thread 1 eventually gets it CPU slice ?
E.g. there might be no interrupts set up yet, and idle thread on UP
gets on CPU, then the whole boot process could deadlock.
>
> I've implemented these changes and tested them for x86. For x86 at least
> AP startup needed some bits of the interrupt infrastructure in place, so
> I moved SI_SUB_SMP up to after SI_SUB_INTR but before SI_SUB_SOFTINTR. I
> modified the *sleep() and cv_*wait*() routines to not always bail if cold
> is true. Instead, sleeps without a timeout are permitted to sleep
> "normally". Sleeps with a timeout drop their interlock and yield the
> CPU (but remain runnable). Since APs are now fully running this means
> interrupts are now routed to all CPUs from the get go removing the need for
> the post-boot shuffle. This also resolves the issue of running out of IDT
> vectors on the boot CPU.
>
> I believe that adopting other platforms for this change should be relatively
> simple, but we should do that before committing the full patch. I do think
> that some parts of the patch (such as the changes to the sleep routines, and
> using SI_SUB_LAST instead of SI_SUB_SMP as a catch-all SYSINIT) can be
> committed now without breaking anything.
>
> However, I'd like feedback on the general idea and if it is acceptable I'd
> like to coordinate testing with other platforms so this can go into the
> tree.
>
> The current changes are in the 'ap_startup' branch at github/bsdjhb/freebsd.
> You can view them here:
>
> https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:ap_startup
>
> --
> John Baldwin
> _______________________________________________
> freebsd-arch at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
More information about the freebsd-arch
mailing list