RE: [EXTERNAL] Re: git: 9729f076e4d9 - main - arm64: Hyper-V: enablement for ARM64 in Hyper-V (Part 3, final)

From: Souradeep Chakrabarti <schakrabarti_at_microsoft.com>
Date: Wed, 26 Apr 2023 20:30:45 UTC


>-----Original Message-----
>From: Souradeep Chakrabarti
>Sent: Wednesday, April 26, 2023 7:26 PM
>To: Kyle Evans <kevans@freebsd.org>
>Cc: Wei Hu <whu@freebsd.org>; src-committers@freebsd.org; dev-commits-src-
>all@freebsd.org; dev-commits-src-main@freebsd.org
>Subject: RE: [EXTERNAL] Re: git: 9729f076e4d9 - main - arm64: Hyper-V:
>enablement for ARM64 in Hyper-V (Part 3, final)
>
>
>
>
>>-----Original Message-----
>>From: Kyle Evans <kevans@freebsd.org>
>>Sent: Wednesday, April 26, 2023 3:39 AM
>>To: Souradeep Chakrabarti <schakrabarti@microsoft.com>
>>Cc: Kyle Evans <kevans@freebsd.org>; Wei Hu <whu@freebsd.org>; src-
>>committers@freebsd.org; dev-commits-src-all@freebsd.org;
>>dev-commits-src- main@freebsd.org
>>Subject: Re: [EXTERNAL] Re: git: 9729f076e4d9 - main - arm64: Hyper-V:
>>enablement for ARM64 in Hyper-V (Part 3, final)
>>
>>On Tue, Apr 25, 2023 at 2:42 AM Souradeep Chakrabarti
>><schakrabarti@microsoft.com> wrote:
>>>
>>>
>>>
>>>
>>> >-----Original Message-----
>>> >From: Souradeep Chakrabarti
>>> >Sent: Monday, April 24, 2023 6:24 PM
>>> >To: Kyle Evans <kevans@freebsd.org>; Wei Hu <whu@freebsd.org>
>>> >Cc: src-committers@freebsd.org; dev-commits-src-all@freebsd.org;
>>> >dev-commits- src-main@freebsd.org
>>> >Subject: RE: [EXTERNAL] Re: git: 9729f076e4d9 - main - arm64: Hyper-V:
>>> >enablement for ARM64 in Hyper-V (Part 3, final)
>>> >
>>> >
>>> >
>>> >
>>> >>-----Original Message-----
>>> >>From: Kyle Evans <kevans@freebsd.org>
>>> >>Sent: Wednesday, April 19, 2023 11:00 AM
>>> >>To: Wei Hu <whu@freebsd.org>; Souradeep Chakrabarti
>>> >><schakrabarti@microsoft.com>
>>> >>Cc: src-committers@freebsd.org; dev-commits-src-all@freebsd.org;
>>> >>dev-commits- src-main@freebsd.org
>>> >>Subject: [EXTERNAL] Re: git: 9729f076e4d9 - main - arm64: Hyper-V:
>>> >>enablement for ARM64 in Hyper-V (Part 3, final)
>>> >>
>>> >>On Thu, Oct 27, 2022 at 8:54 AM Wei Hu <whu@freebsd.org> wrote:
>>> >>>
>>> >>> The branch main has been updated by whu:
>>> >>>
>>> >>> URL:
>>> >>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2
>>> >>> F
>>> >>>
>>cgi%2F&data=05%7C01%7Cschakrabarti%40microsoft.com%7Ce1cfc3aff7f94
>>> >>>
>>d5c2ca408db45d9c4c5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
>7
>>C6
>>> >>>
>>38180573867118793%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMD
>Ai
>>LCJQ
>>> >>>
>>IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=k
>>> >>>
>>MntMYzegWgF%2B90vq6FERBLjwfM%2FVb1GUNwUP9pv90Q%3D&reserved=0
>>> >>> t
>>> >>>
>>>
>>>>.freebsd.org%2Fsrc%2Fcommit%2F%3Fid%3D9729f076e4d93c5a37e78d427b
>f
>>e
>>> >0
>>> >>f1a
>>> >>>
>>>
>>>>b99bbcc6&data=05%7C01%7Cschakrabarti%40microsoft.com%7C393e855f13
>c
>>6
>>> >>49a
>>> >>>
>>>
>>>>8883708db4097211e%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0
>%
>>7
>>> >C
>>> >>6381747
>>> >>>
>>>
>>>>90106786449%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJ
>QI
>>j
>>> >oi
>>> >>V2luMz
>>> >>>
>>>
>>>>IiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=huiaOBSB
>2
>>Z
>>> >>QDn5Q
>>> >>> cw%2FLWnmI%2FqbbriQd7HkYtpxnGLGE%3D&reserved=0
>>> >>>
>>> >>> commit 9729f076e4d93c5a37e78d427bfe0f1ab99bbcc6
>>> >>> Author:     Souradeep Chakrabarti <schakrabarti@microsoft.com>
>>> >>> AuthorDate: 2022-10-27 13:46:08 +0000
>>> >>> Commit:     Wei Hu <whu@FreeBSD.org>
>>> >>> CommitDate: 2022-10-27 13:53:22 +0000
>>> >>>
>>> >>>     arm64: Hyper-V: enablement for ARM64 in Hyper-V (Part 3,
>>> >>> final)
>>> >>>
>>> >>>     This is the last part for ARM64 Hyper-V enablement. This includes
>>> >>>     commone files and make file changes to enable the ARM64 FreeBSD
>>> >>>     guest on Hyper-V. With this patch, it should be able to build
>>> >>>     the ARM64 image and install it on Hyper-V.
>>> >>>
>>> >>
>>> >>Hi,
>>> >>
>>> >>First off- thanks for doing this work! I can't seem to boot a
>>> >>-CURRENT image under Hyper-V on a Volterra machine, seemingly due
>>> >>to vmbus. It stalls right
>>> >after "vmbus:
>>> >>the irq 18" but before emitting a version number, I have some other
>>> >>comments here...
>>> >>
>>> >>> [... snip ...]
>>> >>> diff --git a/sys/dev/hyperv/vmbus/vmbus.c
>>> >>> b/sys/dev/hyperv/vmbus/vmbus.c index b0cd750b26c8..f370f2a75b99
>>> >>> 100644
>>> >>> --- a/sys/dev/hyperv/vmbus/vmbus.c
>>> >>> +++ b/sys/dev/hyperv/vmbus/vmbus.c
>>> >>> [... snip ...]
>>> >>> @@ -107,7 +113,7 @@ static uint32_t
>>> >>vmbus_get_vcpu_id_method(device_t bus,
>>> >>>                                     device_t dev, int cpu);
>>> >>>  static struct taskqueue                *vmbus_get_eventtq_method(device_t,
>>device_t,
>>> >>>                                     int); -#ifdef
>>> >>> EARLY_AP_STARTUP
>>> >>> +#if defined(EARLY_AP_STARTUP) || defined(__aarch64__)
>>> >>>  static void                    vmbus_intrhook(void *);
>>> >>>  #endif
>>> >>>
>>> >>
>>> >>My gut reaction to this is that this is a red flag.
>>> >>EARLY_AP_STARTUP implies characteristics that aarch64 doesn't
>>> >>exhibit; it's hard to see why this conditional is OK, or whether
>>> >>it's testing EARLY_AP_STARTUP as a bad way to write __i386__ || __amd64__.
>>> >[Souradeep] We had or'd the aarch64 here, as we wanted to defer the
>>> >vmbus attachment after gic and arm generic timer is attached. As we
>>> >need PAUSE and DELAY for vmbus and hyper-v drivers. For that we
>>> >wanted to use vmbus_intrhook().
>>> >>
>>> >>> [... snip ...]
>>> >>> @@ -760,7 +736,7 @@ vmbus_synic_setup(void *xsc)
>>> >>>
>>> >>>         if (hyperv_features & CPUID_HV_MSR_VP_INDEX) {
>>> >>>                 /* Save virtual processor id. */
>>> >>> -               VMBUS_PCPU_GET(sc, vcpuid, cpu) = rdmsr(MSR_HV_VP_INDEX);
>>> >>> +               VMBUS_PCPU_GET(sc, vcpuid, cpu) =
>>> >>> + RDMSR(MSR_HV_VP_INDEX);
>>> >>>         } else {
>>> >>>                 /* Set virtual processor id to 0 for compatibility. */
>>> >>>                 VMBUS_PCPU_GET(sc, vcpuid, cpu) = 0;
>>> >>
>>> >>This one, vmbus_synic_setup(), is invoked via vmbus_intrhook ->
>>> >>vmbus_doattach -
>>> >>> smp_rendezvous. On !EARLY_AP_STARTUP (e.g., aarch64), SMP isn't
>>> >>> functional in
>>> >>intrhooks and smp_rendezvous() will just call vmbus_synic_setup()
>>> >>on the boot
>>> >AP.
>>> >>There's nothing that will initialize the pcpu data on every other AP, AFAICT.
>>> >>
>>> >>That said, the !EARLY_AP_STARTUP path is also wrong. Quoting lines
>>> >>that weren't in this e-mail from vmbus_attach():
>>> >[Souradeep] Thanks! This is a really good catch, and yes, we are
>>> >missing the smp initialization during vmbus synic setup.
>>> >>
>>> >>1527 #else   /* !EARLY_AP_STARTUP */
>>> >>1528         /*
>>> >>1529          * If the system has already booted and thread
>>> >>1530          * scheduling is possible indicated by the global
>>> >>1531          * cold set to zero, we just call the driver
>>> >>1532          * initialization directly.
>>> >>1533          */
>>> >>1534         if (!cold)
>>> >>1535                 vmbus_doattach(vmbus_sc);
>>> >>1536 #endif /* EARLY_AP_STARTUP  and aarch64 */
>>> >>
>>> >>The two immediate issues I see is that in a device_attach, SMP
>>> >>won't be started. It's also going to be before SI_SUB_CONFIGURE, so
>>> >>`cold` will never be 0 here -- this is effectively dead code, and
>>> >>if it weren't dead code it would suffer the same problem as above
>>> >>where other APs aren't started yet so we won't set their pcpu data.
>>> >>This is OK, though, because then we have this sysinit later on that
>>> >>does the same thing for !EARLY_AP_STARTUP && !__aarch64__`, but
>>> >>that should really just be `!EARLY_AP_STARTUP` as aarch64 also
>>> >>needs to invoke
>>> >>vmbus_doattach() at SI_SUB_SMP (much later than SI_SUB_DRIVERS).
>>> >[Souradeep] I have tried to use the suggestion but in multi
>>> >processor system the boot is getting stuck at the end and "vmbus0:
>>> >device scan, probe and
>>attach done"
>>> >is not happening.
>>> >I am trying to figure out what is going wrong here, as the same
>>> >change is working with single cpu system. Any help or pointer will
>>> >be really
>>helpful.
>>> [Souradeep]  I can see the problem the task vmbus_scandone_task is
>>> getting enqueued but it is not getting executed.
>>>
>>> 525 static void
>>>  526 vmbus_scan_done(struct vmbus_softc *sc,
>>>  527     const struct vmbus_message *msg __unused)
>>>  528 {
>>>  529     device_printf(sc->vmbus_dev, " vmbus_scan_done is called\n");
>>>  530
>>>  531     taskqueue_enqueue(sc->vmbus_devtq, &sc->vmbus_scandone_task);
>>>  532 }
>>> vmbus scan done is getting called.
>>>
>>> But
>>> 514 static void
>>>  515 vmbus_scan_done_task(void *xsc, int pending __unused)
>>>  516 {
>>>  517     struct vmbus_softc *sc = xsc;
>>>  518     device_printf(sc->vmbus_dev, " vmbus_scan_done_task is called\n");
>>>  519     bus_topo_lock();
>>>  520     sc->vmbus_scandone = true;
>>>  521     bus_topo_unlock();
>>>  522     wakeup(&sc->vmbus_scandone);
>>>  523 }
>>> vmbus_scan_done_task  is not getting called the task .
>>>
>>> But the same is happening when single cpu is running, for multi cpu
>>> it is getting
>>stuck.
>>
>>Hi,
>>
>>That seems odd. What happens if you bump the SYSINIT up to SI_SUB_SMP
>>+ 1, SI_ORDER_FIRST? We don't know for a fact that all APs are ready
>>for scheduling until after smp_after_idle_runnable(), which is also at
>>SI_ORDER_ANY
>>-- maybe there's just something going horribly wrong.
>>That would perhaps explain why it's fine on a single processor system,
>>which won't do anything useful (at least in later parts of SI_SUB_SMP).
>[Souradeep]
>In ARM64 SMP(VM with two cpu),  storvsc attach is happening two times for single
>scsi controller.
>But for intel similar VM (two cpu), it is happening once.
>For the dummy/fake storvsc in arm64, we are getting stuck at device_attach.
>
>Details:
>
>vmbus_scan_done(), not getting invoked because vmbus_add_child() is not
>complete for a channel 15, because of which vmbus_devtq is having one task
>pending.
>
>Now
>By passing NMI in the hung system, after examining all threads:
>
>sched_switch() at sched_switch+0x4dc
>mi_switch() at mi_switch+0x194
>sleepq_switch() at sleepq_switch+0xfc
>_cv_wait() at _cv_wait+0x160
>_sema_wait() at _sema_wait+0x50
>storvsc_attach() at storvsc_attach+0x610
>device_attach() at device_attach+0x3f8
>device_probe_and_attach() at device_probe_and_attach+0x7c
>vmbus_add_child() at vmbus_add_child+0x64
>
>Now ,
>
>It is stuck at waiting on sema_wait() on request->synch_sema in
>hv_storvsc_channel_init() because
>sema_post() on request->synch_sema is not getting invoked. Which unlocks it.
>This is because we are waiting on sema_wait on synch_sema
>hv_storvsc_channel_init(), for storvsc1 , but there is no storvsc1 device. So not
>getting a callback called for storvsc1.
>
>From ARM64 debug log:
>If you see at line 545 again SCI device got detected.
>
>	Line  370: storvsc0: Enlightened SCSI device detected
>	Line  371: storvsc0: <Hyper-V SCSI> on vmbus0
>	Line  406: (probe0:storvsc0:0:0:0): storvsc scsi_status = 2, srb_status = 6
>	Line  421: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
>	Line  436: da0: <Msft Virtual Disk 1.0> Fixed Direct Access SPC-3 SCSI device
>	Line  443: pass1: <Msft Virtual DVD-ROM 1.0> Removable CD-ROM SPC-3
>SCSI device
>	Line  447: cd0: <Msft Virtual DVD-ROM 1.0> Removable CD-ROM SPC-3
>SCSI device
>	Line  545: storvsc1: Enlightened SCSI device detected
>	Line  547: storvsc1: Enlightened SCSI device detected
>	Line  549: storvsc1: <Hyper-V SCSI>hv_storvsc_on_channel_callback is
>called
>
>From Log:
>
>unknown: device_add_child for chan15
>storvsc1: Enlightened SCSI device detected
>storvsc1: Enlightened SCSI device detected
>storvsc1: <Hyper-V SCSI> on vmbus0
>storvsc ringbuffer size: 262144, max_io: 512
>storvsc1: chan15 assigned to cpu1 [vcpu1]
>hn0: link state changed to UP
>vmbus0: vmbus_chanmsg_handle type 0xa
>storvsc1: gpadl_conn(chan15) succeeded
>vmbus0: vmbus_chanmsg_handle type 0x6
>storvsc1: chan15 opened
>waiting on sema wait synch_sema hv_storvsc_channel_init
[Souradeep] The fix is working, the test bed had an issue, after fixing that the fix is working.
I will share the fix by this week.
>>
>>Thanks,
>>
>>Kyle Evans