Re: Running Mezzano in bhyve

From: Vasily Postnicov <shamaz.mazum_at_gmail.com>
Date: Thu, 10 Oct 2024 19:43:28 UTC
I suspect PCI interrupts are not functioning correctly.

Look at this code:
    ;; Attach interrupt handler.
    (sup:debug-print-line "Handler: " (ahci-irq-handler ahci))
    (sup:irq-attach (sup:platform-irq (pci:pci-intr-line location))
                    (ahci-irq-handler-function ahci)
                    ahci)

and this

(defun pci-intr-line (device)
  (pci-config/8 device +pci-config-intr-line+)) ;; comment by me: the
constant is #x3c

I found that "PCI 0x3c" means PCI interrupt pin. AFAIK, interrupt pins are
not supported by bhyve, is that correct? If it's true, I need either to
teach bhyve how to deal with legacy interrupts or to teach Mezzano to
understand MSI. What would be easier in your opinion?

чт, 10 окт. 2024 г. в 17:12, Vasily Postnicov <shamaz.mazum@gmail.com>:

> I was able to fix panics in both virtio and AHCI. This is what I found:
>
> 1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to
> some IO port in the runtime doing something like (funcall (intern (format
> nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error in
> the name of one of the accessors, so FUNCALL tried to call an unbound
> symbol, hence the page fault.
> 2) AHCI had the following code:
>
> ;; Magic hacks for Intel devices?
> ;; Set port enable bits in Port Control and Status on Intel controllers.
> (when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086)
>   (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position+)
>                            (ahci-global-register ahci
> +ahci-register-CAP+))))
>          (pcs (pci:pci-config/16 location #x92)))
>     (setf (pci:pci-config/16 location #x92) (logior pcs
>                                                     (ash #xFF (- (- 8
> n-ports)))))))
>
> I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports)))
> is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS =
> 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output:
>
> Detected AHCI ABAR at C1002000
> AHCI IRQ is B
> Host Capabilities FF30FF25
> Global Host Control 80000000
> Interrupt Status 0
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> AHCI HBA version 1.300
> Handler: 0
> Config register: 17
> Port 0
> Waiting for CR/FR to stop.
> Allocated port data at 105C33000
> Command List at 105C33000
> Received FIS at 105C33400
> Command Tabl at 105C33500
> Initializing device on port 0
>  Command List Base Address 5C33000
>  Command List Base Address Upper 32-bits 1
>  FIS Base Address 5C33400
>  FIS Base Address Upper 32-bits 1
>  Interrupt Status 0
>  Interrupt Enable 7D80003F
>  Command and Status 1C017
>  Task File Data 50
>  Signature 101
>  SATA Status (SCR0: SStatus) 133
>  SATA Control (SCR2: SControl) 300
>  SATA Error (SCR1: SError) 0
>  SATA Active (SCR3: SActive) 0
>  Command Issue 0
>  SATA Notification (SCR4: SNotification) 0
>  FIS-based Switching Control 0
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Command completed.
> 105C33600: 28A20040 100000 0 3F
> 105C33610: 0 59564248 4644452D 2D413239
> 105C33620: 382D4136 39433646 0 30300000
> 105C33630: 20203120 42482020 45205956 54415341
> 105C33640: 49532044 20204B20 20202020 20202020
> 105C33650: 20202020 20202020 20202020 80802020
> 105C33660: B000000 4000 60000 0
> 105C33670: 0 0 A00000 70000
> 105C33680: 780003 780078 40200078 0
> 105C33690: 0 1F0000 40010E 0
> 105C336A0: 2803F0 74004068 40684000 4000B400
> 105C336B0: 7F 0 0 0
> 105C336C0: 0 0 A00000 0
> 105C336D0: 10000 6008 0 0
> 105C336E0: 0 0 0 40080000
> 105C336F0: 4008 0 0 0
> 105C33700: 0 0 0 0
> 105C33710: 0 0 0 0
> 105C33720: 0 0 0 0
> 105C33730: 0 0 0 0
> 105C33740: 0 0 0 0
> 105C33750: 10000 0 0 0
> 105C33760: 0 0 0 0
> 105C33770: 0 0 0 0
> 105C33780: 0 0 0 0
> 105C33790: 0 0 0 0
> 105C337A0: 40000000 0 0 0
> 105C337B0: 0 0 0 1020
> 105C337C0: 0 0 0 0
> 105C337D0: 0 0 0 0
> 105C337E0: 0 0 0 0
> 105C337F0: 0 0 0 78A50000
> Features (83): 7400
> Sector size: 200
> Sector count: A00000
> Serial: BHYVE-FD29-AA68-6F9C
> Model: BHYVE SATA DISK
> Registered new R/W disk #<149CAC9> sectors:A00000
> Host Capabilities FF30FF25
> Global Host Control 80000002
> Interrupt Status 1
> Ports Implemented 1
> Version 10300
> Command Completion Coalescing Control 0
> Command Completion Coalescing Ports 0
> Enclosure Management Location 0
> Enclosure Management Control 0
> Host Capabilities Extended 4
> BIOS/OS Handoff Control and Status 0
> PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF
>     40: Unknown capability 10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Detected MBR style parition table on disk #<149CAC9>
> Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800
> Registered new R/W disk #<149CCD9> sectors:800
> Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800
> Registered new R/W disk #<149CD89> sectors:800
> Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000
> Registered new R/W disk #<149CE39> sectors:9FE000
> Looking for paging disk with UUID
> 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10
> *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
> Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on
> disk #<149CE39>
> Found boot image on disk #<149CE39>!
> BML4 at -7FFFFFEFD000
> Store freelist block is 2
>
> It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED"
> messages. For virtio-blk, it's almost the same with an exception that it
> hangs completely. I'll try to investigate further. Meanwhile, can you make
> any suggestions why those magic intel AHCI controller hacks are required
> and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve?
>