Starting APs earlier during boot
K. Macy
kmacy at freebsd.org
Sat Mar 19 02:02:43 UTC 2016
On Fri, Mar 18, 2016 at 12:37 PM, K. Macy <kmacy at freebsd.org> wrote:
> So none of these changes have been committed yet?
>
> I'm hitting hangs in USB on boot with recent HEAD and without having
> investigating had thought this might be what exposed the problem.
Never mind. It's yet another ZFS namespace deadlock.
-M
>
>
> On Friday, March 18, 2016, John Baldwin <jhb at freebsd.org> wrote:
>>
>> On Tuesday, February 16, 2016 12:50:22 PM John Baldwin wrote:
>> > Currently the kernel bootstraps the non-boot processors fairly early in
>> > the
>> > SI_SUB_CPU SYSINIT. The APs then spin waiting to be "released". We
>> > currently
>> > release the APs as one of the last steps at SI_SUB_SMP. On the one hand
>> > this
>> > removes much of the need for synchronization while SYSINITs are running
>> > since
>> > SYSINITs basically assume they are single-threaded. However, it also
>> > enforces
>> > some odd quirks. Several places that deal with per-CPU resources have
>> > to
>> > split initialization up so that the BSP init happens in one SYSINIT and
>> > the
>> > initialization of the APs happens in a second SYSINIT at SI_SUB_SMP.
>> >
>> > Another issue that is becoming more prominent on x86 (and probably will
>> > also
>> > affect other platforms if it isn't already) is that to support working
>> > interrupts for interrupt config hooks we bind all interrupts to the BSP
>> > during
>> > boot and only distribute them among other CPUs near the end at
>> > SI_SUB_SMP.
>> > This is especially problematic with drivers for modern hardware
>> > allocating
>> > num(CPUs) interrupts (hoping to use one per CPU). On x86 we have aboug
>> > 190
>> > IDT vectors available for device interrupts, so in theory we should be
>> > able to
>> > tolerate a lot of drivers doing this (e.g. 60 drivers could allocate 3
>> > interrupts for every CPU and we should still be fine). However, if you
>> > have,
>> > say, 32 cores in a system, then you can only handle about 5 drivers
>> > doing
>> > this before you run out of vectors on CPU 0.
>> >
>> > Longer term we would also like to eventually have most drivers attach in
>> > the
>> > same environment during boot as during post-boot. Right now post-boot
>> > is
>> > quite different as all CPUs are running, interrupts work, etc. One of
>> > the
>> > goals of multipass support for new-bus is to help us get there by
>> > probing
>> > enough hardware to get timers working and starting the scheduler before
>> > probing the rest of the devices. That goal isn't quite realized yet.
>> >
>> > However, we can run a slightly simpler version of our scheduler before
>> > timers are working. In fact, sleep/wakeup work just fine fairly early
>> > (we
>> > allocate the necessary structures at SI_SUB_KMEM which is before the APs
>> > are even started). Once idle threads are created and ready we could in
>> > theory let the APs startup and run other threads. You just don't have
>> > working
>> > timeouts. OTOH, you can sort of simulate timeouts if you modify the
>> > scheduler
>> > to yield the CPU instead of blocking the thread for a sleep with a
>> > timeout.
>> > The effect would be for threads that do sleeps with a timeout to fall
>> > back to
>> > polling before timers are working. In practice, all of the early kernel
>> > threads use sleeps without timeouts when idle so this doesn't really
>> > matter.
>>
>> After some more testing, I've simplified the early scheduler a bit. It no
>> longer tries to simulate timeouts by just keeping the thread runnable.
>> Instead,
>> a sleep with a timeout just panics. However, it does still permit sleeps
>> with
>> infinite sleeps. Some code that uses a timeout really wants a timeout
>> (note
>> that pause() has a hack to fallback to DELAY() internally if cold is true
>> for
>> this reason). Instead, my feeling is that any kthreads that need timeouts
>> to
>> work need to defer their startup until SI_SUB_KICK_SCHEDULER.
>>
>> > However, I'd like feedback on the general idea and if it is acceptable
>> > I'd
>> > like to coordinate testing with other platforms so this can go into the
>> > tree.
>>
>> I don't think I've seen any objections? This does need more testing. I
>> will
>> update the patch to add a new EARLY_AP_STARTUP kernel option so this can
>> be
>> committed (but not yet enabled) allowing for easier testing (and allowing
>> other platforms to catch up to x86).
>>
>> > The current changes are in the 'ap_startup' branch at
>> > github/bsdjhb/freebsd.
>> > You can view them here:
>> >
>> > https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:ap_startup
>>
>> --
>> John Baldwin
>> _______________________________________________
>> freebsd-arch at freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-arch
>> To unsubscribe, send any mail to "freebsd-arch-unsubscribe at freebsd.org"
More information about the freebsd-arch
mailing list