RE: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in ARM64 SMP
Date: Mon, 15 May 2023 17:06:48 UTC
>-----Original Message----- >From: Kyle Evans <kevans@freebsd.org> >Sent: Monday, May 15, 2023 7:24 PM >To: Souradeep Chakrabarti <schakrabarti@microsoft.com> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >Subject: Re: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in ARM64 SMP > >On Mon, May 15, 2023 at 2:41 AM Souradeep Chakrabarti ><schakrabarti@microsoft.com> wrote: >> >> >> >> >> >-----Original Message----- >> >From: Kyle Evans <kevans@freebsd.org> >> >Sent: Monday, May 15, 2023 10:36 AM >> >To: Souradeep Chakrabarti <schakrabarti@microsoft.com> >> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >> >Subject: Re: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in >> >ARM64 SMP >> > >> >On Sun, May 14, 2023 at 9:59 AM Souradeep Chakrabarti >> ><schakrabarti@microsoft.com> wrote: >> >> >> >> >> >> >> >> >> >> >-----Original Message----- >> >> >From: Kyle Evans <kevans@freebsd.org> >> >> >Sent: Friday, May 12, 2023 10:40 PM >> >> >To: Souradeep Chakrabarti <schakrabarti@microsoft.com> >> >> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >> >> >Subject: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in >> >> >ARM64 SMP >> >> > >> >> >On Fri, May 12, 2023 at 9:51 AM Souradeep Chakrabarti >> >> ><schakrabarti@microsoft.com> wrote: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >-----Original Message----- >> >> >> >From: Souradeep Chakrabarti >> >> >> >Sent: Monday, May 8, 2023 6:39 PM >> >> >> >To: Kyle Evans <kevans@freebsd.org> >> >> >> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >> >> >> >Subject: enabling same PPI interrupt to all CPU in ARM64 SMP >> >> >> > >> >> >> >Hi , >> >> >> > >> >> >> >While using SMP in ARM64 Hyper-V we are getting stuck in boot >> >> >> >if there is a interrupt for VMBus coming to CPU1 and VMBus >> >> >> >interrupt handler is not getting that interrupt. >> >> >> > >> >> >> >In ARM64 Hyper-V we are using IRQ18 for VMBus and it is a PPI interrupt. >> >> >> > >> >> >> >But Hypev-V host sends interrupt to this IRQ 18 for both CPU0 >> >> >> >and >> >> >> >CPU1 in 2CPU system. >> >> >> >This is based on the corresponding VMBus channel which assigned >> >> >> >with the >> >CPU. >> >> >> > >> >> >> >Now VMBus ISR is getting the interrupt in CPU0 but not getting from >CPU1. >> >> >> >Any idea, how we can use the same PPI 18 for all the CPU cores? >> >> >> > >> >> >> >Any help will be appreciated, as this is blocking the >> >> >> >enablement of FreeBSD in Azure ARM64. >> >> >> [Souradeep] >> >> >> Can someone please help me it. >> >> >> >> >> > >> >> >Looking at least at the GIC implementation, it looks like this is a known >limitation: >> >> > >> >> > 875 /* >> >> > 876 * XXX - In case that per CPU interrupt is going to be >> >> >enabled in time >> >> > 877 * when SMP is already started, we need some IPI >> >> >call which >> >> > 878 * enables it on others CPUs. Further, it's more >> >> >complicated as >> >> > 879 * pic_enable_source() and pic_disable_source() >> >> >should act on >> >> > 880 * per CPU basis only. Thus, it should be solved >> >> >here somehow. >> >> > 881 */ >> >> > 882 if (isrc->isrc_flags & INTR_ISRCF_PPI) >> >> > 883 CPU_SET(PCPU_GET(cpuid), &isrc->isrc_cpu); >> >> > >> >> >I think we need something /like/ this: >> >> >https://peo/ >> >> >> >>>ple.fr%2F&data=05%7C01%7Cschakrabarti%40microsoft.com%7C39d7c50849b >> >64 >> >> >> >>>eaf70b608db55020fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0 >% >> >7C6381 >> >> >> >>>97239615544113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL >C >> >JQIjoiV2 >> >> >> >>>luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dXVv >nI >> >HhN9 >> >> >mdwiPkSJKwMyEKYi5SyGOuta5zCZ1ysCQ%3D&reserved=0 >> >> >> >>>eebsd.org%2F~kevans%2Fppi.diff&data=05%7C01%7Cschakrabarti%40microso >> >>f >> >> >t.c >> >>om%7Cc5c3d254b9d841e9ae9b08db530bb3d2%7C72f988bf86f141af91ab2d7 >c >> >> >> >>>d011db47%7C1%7C0%7C638195082027744706%7CUnknown%7CTWFpbGZs >b >> >3 >> >> >> >>>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3 >D >> >> >> >>>%7C3000%7C%7C%7C&sdata=SwoR2vHxh6QQhwpOgFcQ9378nDVovhdEKEXo >E >> >o >> >> >gPKsc%3D&reserved=0, though it still has the caveat that PPIs >> >> >effectively cannot be fully setup before SI_SUB_SMP. >> >> >So, it's likely almost a NOP for existing platforms (will emit a >> >> >warning with bootverbose for armv8 timers) but might do the trick for you. >> >> [Souradeep] >> >> Thanks for the change but it did not solve the problem. Still the >> >> interrupt handler vmbus_handle_intr(struct trapframe *trap_frame), >> >> is not >> >getting called for the CPU 1. >> >> It is only getting called for CPU 0 all the time in ARM64 but in >> >> x86 it is getting called for both CPU1 and CPU0. >> > >> >Interesting! I do see one problem with the patch (and some cosmetic >> >issues): we really need to take the gic_mtx in gic_v3_setup_periph() >> >right up front because CPU_SET() won't necessarily be atomic. That's >> >not the problem, though for other reasons, but also because... >> > >> >> From DDB I have collected this data in arm64. irq18 is for vmbus. >> >> db> show irqs >> >> ... >> >> irq18 <gic0,p2>: cpu 03 cnt 411 >> >> .... >> >> >> > >> >That would seem to indicate that both CPUs have set it up, but it >> >occurs to me that enable_intr also needs the same treatment. Let's >> >wipe gic_v3.c back to a clean slate and try a v2 of the patch: >> >https://peo/ >> >>ple.fr%2F&data=05%7C01%7Cschakrabarti%40microsoft.com%7Cbeb78558bef0 >4 >> >>c72821608db554be0f8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0% >7C6381 >> >>97556740270971%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLC >JQIjoiV2 >> >>luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IXzMbv >qLqY >> >u6gqLsN0pNbGq3gvDRUoX9bYn9UB7NXRU%3D&reserved=0 >> >eebsd.org%2F~kevans%2Fppi- >> >>v2.diff&data=05%7C01%7Cschakrabarti%40microsoft.com%7C39d7c50849b64e >a >> >>f70b608db55020fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C >6 >> >>38197239615544113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD >Ai >> >>LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sda >t >> >>a=QLDm8X1yuuGOy2OeeYxsnF%2FNeuceAogw5161U7ZH6%2Bs%3D&reserved= >0 >> > >> >For now we just pretend that we won't be disabling any PPIs as a proof-of- >concept. >> > >> [Souradeep] >> This is causing panic during boot: >> pmu0: MADT: cpu 0 (mpidr 0) irq 0 level-triggered >> pmu0: MADT: cpu 1 (mpidr 1) irq 0 level-triggered >> panic: gic_v3_enable_intr_impl: Unsupported IRQ 0 cpuid = 0 time = 1 >> KDB: stack backtrace: >> db_trace_self() at db_trace_self >> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >> vpanic() at vpanic+0x13c >> panic() at panic+0x44 >> gic_v3_enable_intr_impl() at gic_v3_enable_intr_impl+0xec >> intr_setup_irq() at intr_setup_irq+0x368 >> bus_setup_intr() at bus_setup_intr+0x94 >> pmu_attach() at pmu_attach+0x64 >> pmu_acpi_attach() at pmu_acpi_attach+0x94 >> device_attach() at device_attach+0x3f8 >> device_probe_and_attach() at device_probe_and_attach+0x7c >> bus_generic_new_pass() at bus_generic_new_pass+0xfc >> bus_generic_new_pass() at bus_generic_new_pass+0xac >> bus_generic_new_pass() at bus_generic_new_pass+0xac >> bus_set_pass() at bus_set_pass+0x4c >> mi_startup() at mi_startup+0x1fc >> virtdone() at virtdone+0x70 >> KDB: enter: panic >> [ thread pid 0 tid 100000 ] >> Stopped at kdb_enter+0x44: str xzr, [x19, #3328] >> Details of the log I have pasted here: >> https://past/ >> >ebin.com%2FeQN5RntD&data=05%7C01%7Cschakrabarti%40microsoft.com%7Cb >eb7 >> >8558bef04c72821608db554be0f8%7C72f988bf86f141af91ab2d7cd011db47%7 >C1%7C >> >0%7C638197556740270971%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA >wMDAiLCJ >> >QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata= >KYfw >> d8nokwqfVzvSMfelgv1TrgMPYtzAtG9N%2FPCRUMQ%3D&reserved=0 > >I think you fetched a broken version of the patch -- there was maybe an hour where >I kept replacing it because I found a problem and fixed it, but created another one to >fix that I discovered on some hardware I tested on. I've re-uploaded the correct diff >at >https://people.fr/ >eebsd.org%2F~kevans%2Fppi- >v3.diff&data=05%7C01%7Cschakrabarti%40microsoft.com%7Cbeb78558bef04c7 >2821608db554be0f8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C >638197556740270971%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD >AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sd >ata=LH%2FV7tXExGsMDRVLlZAEq72XGQF6vR6Z%2Bn0TdoU4ueg%3D&reserved=0 >, which can't hit the panic because irq <= GIC_LAST_PPI all return early (the version >you have, the first branch in gic_v3_enable_intr_impl was probably an incorrect `if >(isrc->isrc_flags & INTR_ISRCF_PPI)` > [Souradeep] It has worked! Thanks a lot. We should get it committed in src. Before I commit the SMP related changes for Hyper-V driver. >Thanks, > >Kyle Evans