Re: Running Mezzano in bhyve

From: Vasily Postnicov <shamaz.mazum_at_gmail.com>
Date: Thu, 10 Oct 2024 17:12:59 UTC
I was able to fix panics in both virtio and AHCI. This is what I found:

1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to
some IO port in the runtime doing something like (funcall (intern (format
nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error in
the name of one of the accessors, so FUNCALL tried to call an unbound
symbol, hence the page fault.
2) AHCI had the following code:

;; Magic hacks for Intel devices?
;; Set port enable bits in Port Control and Status on Intel controllers.
(when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086)
  (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position+)
                           (ahci-global-register ahci
+ahci-register-CAP+))))
         (pcs (pci:pci-config/16 location #x92)))
    (setf (pci:pci-config/16 location #x92) (logior pcs
                                                    (ash #xFF (- (- 8
n-ports)))))))

I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports)))
is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS =
6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output:

Detected AHCI ABAR at C1002000
AHCI IRQ is B
Host Capabilities FF30FF25
Global Host Control 80000000
Interrupt Status 0
Ports Implemented 1
Version 10300
Command Completion Coalescing Control 0
Command Completion Coalescing Ports 0
Enclosure Management Location 0
Enclosure Management Control 0
Host Capabilities Extended 4
BIOS/OS Handoff Control and Status 0
AHCI HBA version 1.300
Handler: 0
Config register: 17
Port 0
Waiting for CR/FR to stop.
Allocated port data at 105C33000
Command List at 105C33000
Received FIS at 105C33400
Command Tabl at 105C33500
Initializing device on port 0
 Command List Base Address 5C33000
 Command List Base Address Upper 32-bits 1
 FIS Base Address 5C33400
 FIS Base Address Upper 32-bits 1
 Interrupt Status 0
 Interrupt Enable 7D80003F
 Command and Status 1C017
 Task File Data 50
 Signature 101
 SATA Status (SCR0: SStatus) 133
 SATA Control (SCR2: SControl) 300
 SATA Error (SCR1: SError) 0
 SATA Active (SCR3: SActive) 0
 Command Issue 0
 SATA Notification (SCR4: SNotification) 0
 FIS-based Switching Control 0
*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
Command completed.
105C33600: 28A20040 100000 0 3F
105C33610: 0 59564248 4644452D 2D413239
105C33620: 382D4136 39433646 0 30300000
105C33630: 20203120 42482020 45205956 54415341
105C33640: 49532044 20204B20 20202020 20202020
105C33650: 20202020 20202020 20202020 80802020
105C33660: B000000 4000 60000 0
105C33670: 0 0 A00000 70000
105C33680: 780003 780078 40200078 0
105C33690: 0 1F0000 40010E 0
105C336A0: 2803F0 74004068 40684000 4000B400
105C336B0: 7F 0 0 0
105C336C0: 0 0 A00000 0
105C336D0: 10000 6008 0 0
105C336E0: 0 0 0 40080000
105C336F0: 4008 0 0 0
105C33700: 0 0 0 0
105C33710: 0 0 0 0
105C33720: 0 0 0 0
105C33730: 0 0 0 0
105C33740: 0 0 0 0
105C33750: 10000 0 0 0
105C33760: 0 0 0 0
105C33770: 0 0 0 0
105C33780: 0 0 0 0
105C33790: 0 0 0 0
105C337A0: 40000000 0 0 0
105C337B0: 0 0 0 1020
105C337C0: 0 0 0 0
105C337D0: 0 0 0 0
105C337E0: 0 0 0 0
105C337F0: 0 0 0 78A50000
Features (83): 7400
Sector size: 200
Sector count: A00000
Serial: BHYVE-FD29-AA68-6F9C
Model: BHYVE SATA DISK
Registered new R/W disk #<149CAC9> sectors:A00000
Host Capabilities FF30FF25
Global Host Control 80000002
Interrupt Status 1
Ports Implemented 1
Version 10300
Command Completion Coalescing Control 0
Command Completion Coalescing Ports 0
Enclosure Management Location 0
Enclosure Management Control 0
Host Capabilities Extended 4
BIOS/OS Handoff Control and Status 0
PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF
    40: Unknown capability 10
*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
Detected MBR style parition table on disk #<149CAC9>
Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800
Registered new R/W disk #<149CCD9> sectors:800
Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800
Registered new R/W disk #<149CD89> sectors:800
Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000
Registered new R/W disk #<149CE39> sectors:9FE000
Looking for paging disk with UUID
5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10
*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on
disk #<149CE39>
Found boot image on disk #<149CE39>!
BML4 at -7FFFFFEFD000
Store freelist block is 2

It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED"
messages. For virtio-blk, it's almost the same with an exception that it
hangs completely. I'll try to investigate further. Meanwhile, can you make
any suggestions why those magic intel AHCI controller hacks are required
and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve?