RE: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in ARM64 SMP
Date: Mon, 15 May 2023 17:27:02 UTC
>-----Original Message----- >From: owner-freebsd-hackers@freebsd.org <owner-freebsd-hackers@freebsd.org> >On Behalf Of Souradeep Chakrabarti >Sent: Monday, May 15, 2023 10:37 PM >To: Kyle Evans <kevans@freebsd.org> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >Subject: RE: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in ARM64 SMP > > > > >>-----Original Message----- >>From: Kyle Evans <kevans@freebsd.org> >>Sent: Monday, May 15, 2023 7:24 PM >>To: Souradeep Chakrabarti <schakrabarti@microsoft.com> >>Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >>Subject: Re: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in >>ARM64 SMP >> >>On Mon, May 15, 2023 at 2:41 AM Souradeep Chakrabarti >><schakrabarti@microsoft.com> wrote: >>> >>> >>> >>> >>> >-----Original Message----- >>> >From: Kyle Evans <kevans@freebsd.org> >>> >Sent: Monday, May 15, 2023 10:36 AM >>> >To: Souradeep Chakrabarti <schakrabarti@microsoft.com> >>> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >>> >Subject: Re: [EXTERNAL] Re: enabling same PPI interrupt to all CPU >>> >in >>> >ARM64 SMP >>> > >>> >On Sun, May 14, 2023 at 9:59 AM Souradeep Chakrabarti >>> ><schakrabarti@microsoft.com> wrote: >>> >> >>> >> >>> >> >>> >> >>> >> >-----Original Message----- >>> >> >From: Kyle Evans <kevans@freebsd.org> >>> >> >Sent: Friday, May 12, 2023 10:40 PM >>> >> >To: Souradeep Chakrabarti <schakrabarti@microsoft.com> >>> >> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >>> >> >Subject: [EXTERNAL] Re: enabling same PPI interrupt to all CPU in >>> >> >ARM64 SMP >>> >> > >>> >> >On Fri, May 12, 2023 at 9:51 AM Souradeep Chakrabarti >>> >> ><schakrabarti@microsoft.com> wrote: >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >-----Original Message----- >>> >> >> >From: Souradeep Chakrabarti >>> >> >> >Sent: Monday, May 8, 2023 6:39 PM >>> >> >> >To: Kyle Evans <kevans@freebsd.org> >>> >> >> >Cc: Wei Hu <weh@microsoft.com>; freebsd-hackers@FreeBSD.org >>> >> >> >Subject: enabling same PPI interrupt to all CPU in ARM64 SMP >>> >> >> > >>> >> >> >Hi , >>> >> >> > >>> >> >> >While using SMP in ARM64 Hyper-V we are getting stuck in boot >>> >> >> >if there is a interrupt for VMBus coming to CPU1 and VMBus >>> >> >> >interrupt handler is not getting that interrupt. >>> >> >> > >>> >> >> >In ARM64 Hyper-V we are using IRQ18 for VMBus and it is a PPI interrupt. >>> >> >> > >>> >> >> >But Hypev-V host sends interrupt to this IRQ 18 for both CPU0 >>> >> >> >and >>> >> >> >CPU1 in 2CPU system. >>> >> >> >This is based on the corresponding VMBus channel which >>> >> >> >assigned with the >>> >CPU. >>> >> >> > >>> >> >> >Now VMBus ISR is getting the interrupt in CPU0 but not getting >>> >> >> >from >>CPU1. >>> >> >> >Any idea, how we can use the same PPI 18 for all the CPU cores? >>> >> >> > >>> >> >> >Any help will be appreciated, as this is blocking the >>> >> >> >enablement of FreeBSD in Azure ARM64. >>> >> >> [Souradeep] >>> >> >> Can someone please help me it. >>> >> >> >>> >> > >>> >> >Looking at least at the GIC implementation, it looks like this is >>> >> >a known >>limitation: >>> >> > >>> >> > 875 /* >>> >> > 876 * XXX - In case that per CPU interrupt is going to be >>> >> >enabled in time >>> >> > 877 * when SMP is already started, we need some IPI >>> >> >call which >>> >> > 878 * enables it on others CPUs. Further, it's more >>> >> >complicated as >>> >> > 879 * pic_enable_source() and pic_disable_source() >>> >> >should act on >>> >> > 880 * per CPU basis only. Thus, it should be solved >>> >> >here somehow. >>> >> > 881 */ >>> >> > 882 if (isrc->isrc_flags & INTR_ISRCF_PPI) >>> >> > 883 CPU_SET(PCPU_GET(cpuid), &isrc->isrc_cpu); >>> >> > >>> >> >I think we need something /like/ this: >>> >> >https://peo/ >>> >> >>> >>>>ple.fr%2F&data=05%7C01%7Cschakrabarti%40microsoft.com%7C39d7c50849 >b >>> >64 >>> >> >>> >>>>eaf70b608db55020fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0 >>% >>> >7C6381 >>> >> >>> >>>>97239615544113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi >L >>C >>> >JQIjoiV2 >>> >> >>> >>>>luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=dXVv >>nI >>> >HhN9 >>> >> >mdwiPkSJKwMyEKYi5SyGOuta5zCZ1ysCQ%3D&reserved=0 >>> >> >>> >>>>eebsd.org%2F~kevans%2Fppi.diff&data=05%7C01%7Cschakrabarti%40micros >o >>> >>f >>> >> >t.c >>> >>>om%7Cc5c3d254b9d841e9ae9b08db530bb3d2%7C72f988bf86f141af91ab2d >7 >>c >>> >> >>> >>>>d011db47%7C1%7C0%7C638195082027744706%7CUnknown%7CTWFpbGZ >s >>b >>> >3 >>> >> >>> >>>>d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0% >3 >>D >>> >> >>> >>>>%7C3000%7C%7C%7C&sdata=SwoR2vHxh6QQhwpOgFcQ9378nDVovhdEKEX >o >>E >>> >o >>> >> >gPKsc%3D&reserved=0, though it still has the caveat that PPIs >>> >> >effectively cannot be fully setup before SI_SUB_SMP. >>> >> >So, it's likely almost a NOP for existing platforms (will emit a >>> >> >warning with bootverbose for armv8 timers) but might do the trick for you. >>> >> [Souradeep] >>> >> Thanks for the change but it did not solve the problem. Still the >>> >> interrupt handler vmbus_handle_intr(struct trapframe *trap_frame), >>> >> is not >>> >getting called for the CPU 1. >>> >> It is only getting called for CPU 0 all the time in ARM64 but in >>> >> x86 it is getting called for both CPU1 and CPU0. >>> > >>> >Interesting! I do see one problem with the patch (and some cosmetic >>> >issues): we really need to take the gic_mtx in gic_v3_setup_periph() >>> >right up front because CPU_SET() won't necessarily be atomic. That's >>> >not the problem, though for other reasons, but also because... >>> > >>> >> From DDB I have collected this data in arm64. irq18 is for vmbus. >>> >> db> show irqs >>> >> ... >>> >> irq18 <gic0,p2>: cpu 03 cnt 411 >>> >> .... >>> >> >>> > >>> >That would seem to indicate that both CPUs have set it up, but it >>> >occurs to me that enable_intr also needs the same treatment. Let's >>> >wipe gic_v3.c back to a clean slate and try a v2 of the patch: >>> >https://peo/ >>> >>>ple.fr%2F&data=05%7C01%7Cschakrabarti%40microsoft.com%7Cbeb78558bef >0 >>4 >>> >>>c72821608db554be0f8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0 >% >>7C6381 >>> >>>97556740270971%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiL >C >>JQIjoiV2 >>> >>>luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=IXzMb >v >>qLqY >>> >u6gqLsN0pNbGq3gvDRUoX9bYn9UB7NXRU%3D&reserved=0 >>> >eebsd.org%2F~kevans%2Fppi- >>> >>>v2.diff&data=05%7C01%7Cschakrabarti%40microsoft.com%7C39d7c50849b64 >e >>a >>> >>>f70b608db55020fae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7 >C >>6 >>> >>>38197239615544113%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM >D >>Ai >>> >>>LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sd >a >>t >>> >>>a=QLDm8X1yuuGOy2OeeYxsnF%2FNeuceAogw5161U7ZH6%2Bs%3D&reserved >= >>0 >>> > >>> >For now we just pretend that we won't be disabling any PPIs as a >>> >proof-of- >>concept. >>> > >>> [Souradeep] >>> This is causing panic during boot: >>> pmu0: MADT: cpu 0 (mpidr 0) irq 0 level-triggered >>> pmu0: MADT: cpu 1 (mpidr 1) irq 0 level-triggered >>> panic: gic_v3_enable_intr_impl: Unsupported IRQ 0 cpuid = 0 time = 1 >>> KDB: stack backtrace: >>> db_trace_self() at db_trace_self >>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30 >>> vpanic() at vpanic+0x13c >>> panic() at panic+0x44 >>> gic_v3_enable_intr_impl() at gic_v3_enable_intr_impl+0xec >>> intr_setup_irq() at intr_setup_irq+0x368 >>> bus_setup_intr() at bus_setup_intr+0x94 >>> pmu_attach() at pmu_attach+0x64 >>> pmu_acpi_attach() at pmu_acpi_attach+0x94 >>> device_attach() at device_attach+0x3f8 >>> device_probe_and_attach() at device_probe_and_attach+0x7c >>> bus_generic_new_pass() at bus_generic_new_pass+0xfc >>> bus_generic_new_pass() at bus_generic_new_pass+0xac >>> bus_generic_new_pass() at bus_generic_new_pass+0xac >>> bus_set_pass() at bus_set_pass+0x4c >>> mi_startup() at mi_startup+0x1fc >>> virtdone() at virtdone+0x70 >>> KDB: enter: panic >>> [ thread pid 0 tid 100000 ] >>> Stopped at kdb_enter+0x44: str xzr, [x19, #3328] >>> Details of the log I have pasted here: >>> https://past/ >>> >>ebin.com%2FeQN5RntD&data=05%7C01%7Cschakrabarti%40microsoft.com%7C >b >>eb7 >>> >>8558bef04c72821608db554be0f8%7C72f988bf86f141af91ab2d7cd011db47% >7 >>C1%7C >>> >>0%7C638197556740270971%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj >A >>wMDAiLCJ >>> >>QIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata= >>KYfw >>> d8nokwqfVzvSMfelgv1TrgMPYtzAtG9N%2FPCRUMQ%3D&reserved=0 >> >>I think you fetched a broken version of the patch -- there was maybe an >>hour where I kept replacing it because I found a problem and fixed it, >>but created another one to fix that I discovered on some hardware I >>tested on. I've re-uploaded the correct diff at >>https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeopl >>e.fr%2F&data=05%7C01%7Cschakrabarti%40microsoft.com%7Cc7e312d5f11b4 >3eb2 >>44208db5566d6ec%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C6 >381976724 >>63846025%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2 >luMzIiLC >>JBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=XfgCvO7qhnsp >VtajLu5D >>44ZdsCQD1qGSu80elEO03tg%3D&reserved=0 >>eebsd.org%2F~kevans%2Fppi- >>v3.diff&data=05%7C01%7Cschakrabarti%40microsoft.com%7Cbeb78558bef04c >7 >>2821608db554be0f8%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7 >C >>638197556740270971%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwM >D >>AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&s >d >>ata=LH%2FV7tXExGsMDRVLlZAEq72XGQF6vR6Z%2Bn0TdoU4ueg%3D&reserved= >0 >>, which can't hit the panic because irq <= GIC_LAST_PPI all return >>early (the version you have, the first branch in >>gic_v3_enable_intr_impl was probably an incorrect `if (isrc->isrc_flags >>& INTR_ISRCF_PPI)` >> >[Souradeep] It has worked! Thanks a lot. We should get it committed in src. >Before I commit the SMP related changes for Hyper-V driver. >>Thanks, [Souradeep] But after install it is keep getting rebooted after hitting a panic mbus0: vmbus_handle_intr1 for cpu 0 vmbus0: the irq 18 vmbus0: smp_started = 0 KDvmbus0: vmbus_handle_intr1 for cpu 0 B: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 mi_switch() at mi_switch+0x27c sleepq_switch() at sleepq_switch+0xfc _sleep() at _sleep+0x294 vmbus_xact_wait1() at vmbus_xact_wait1+0x120 vmbus_sysinit() at vmbus_sysinit+0x6a4 mi_startup() at mi_startup+0x1fc virtdone() at virtdone+0x70 KDB: reentering KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x30 kdb_reenter() at kdb_reenter+0x44 mi_switch() at mi_switch+0x280 sleepq_switch() at sleepq_switch+0xfc _sleep() at _sleep+0x294 vmbus_xact_wait1() at vmbus_xact_wait1+0x120 vmbus_sysinit() at vmbus_sysinit+0x6a4 mi_startup() at mi_startup+0x1fc virtdone() at virtdone+0x70 Looks like vmbus_sysinit() is getting called even before SMP has started after installation. Wondering what could be the reason. Also any idea how to disable this reboot cycle post post panic, so that I can get the backtrace properly. >> >>Kyle Evans