From nobody Thu Oct 10 17:12:59 2024 X-Original-To: freebsd-virtualization@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4XPbtf6zxPz5YKk6 for ; Thu, 10 Oct 2024 17:13:14 +0000 (UTC) (envelope-from shamaz.mazum@gmail.com) Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "WR4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4XPbtf4ld9z3wbp; Thu, 10 Oct 2024 17:13:14 +0000 (UTC) (envelope-from shamaz.mazum@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-lf1-x133.google.com with SMTP id 2adb3069b0e04-53993c115cfso1566883e87.2; Thu, 10 Oct 2024 10:13:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728580391; x=1729185191; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=V0bF1PPnpsvAfUnX8UGS2a/26N4qeNJPshFt2UAwcx8=; b=JSHiVZIJ3c5+DxJoqOkrJSsAZVdb7X/QGoX0EPXxvAwD3C9hN9t1E0E6u7uI6Or4BQ CF68l9CJd69zZuYhJSTqikFKqMPXsNiYeekOupalKa497v3McmM2VGYUjAya17DLqpQy RPyyR7uHxtAnDUvQ7/MmGkj4Pu67H0F24Qr/LpHxQeJYOKOaMnnALyuljfeHJ1TfnPuN BO47xJOXjMtaLMoGqJ39swxW2/JcAj2OaGYRTP5OHi+hVTnUcoI8onC5yQ0nLERS06Vm BPfzYCledB5nl3SfPx3YNDF75g3LRfxvJ+1yzF1BZTVmjubbCWXwGzYDGrbH3OczuUYy TkuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728580391; x=1729185191; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=V0bF1PPnpsvAfUnX8UGS2a/26N4qeNJPshFt2UAwcx8=; b=u1VrMveFfeGn7L5xoTDWH9jd47WpaO8BvOuHfrgAeIncppQedEmRHZYN7BKu3lE1Rn eyHbscHIdg691I8LYMIJ4/uRCAxXaG86rnEQX8pXVQIRoSkvCYeCz9/WNDiBeC7vWXbr yp6o7ZdxVZN/KMRab64VLb+YKYPvkjGU6aKYhOrxVpN/t/8iykYTAa1k3UD8yI76uY2/ S0fnA/9CZucsraAFxrnZtetjPBHZuUkKsr17sj273mTXIdijp6dJL0tkaFDjED3xCN/K r4M7GYDgtiR9WZRmmAbYK+r4I7+sOLgghznYuT5HnDrZkGK1ufWG7mijdi9nGH34FAR7 rztA== X-Gm-Message-State: AOJu0YwIYcFP8+9K7k3ex5ZWHfR7hl18DNlEkxcOINAn5D8jbnypTvmj qLCPBrg88tcpGLAFeKy0ZEQPLqDPUptgcrvE+Ec6N31GmKvM1XrlKf2QsUDYlHsqvSuVmRq3aEi F6Bw7AdFmkM7D1Sbs4U2bg4rLo+N3t7JyjcuuJg== X-Google-Smtp-Source: AGHT+IEUhXlpwJvSHnRHqkG/0q67LC8cG3tCIHqOiSHzSMQjHF4SXcMmClKZMHYB9GFh5GQAeQNY3iABLzDBDVFcUYM= X-Received: by 2002:ac2:4c49:0:b0:539:9548:41a6 with SMTP id 2adb3069b0e04-539c498ecd9mr4336733e87.61.1728580390779; Thu, 10 Oct 2024 10:13:10 -0700 (PDT) List-Id: Discussion List-Archive: https://lists.freebsd.org/archives/freebsd-virtualization List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-virtualization@freebsd.org Sender: owner-freebsd-virtualization@FreeBSD.org MIME-Version: 1.0 References: <17f4077d-647d-4848-9d6f-97f9886ef636@freebsd.org> <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> In-Reply-To: <8b249b64-d041-4f12-b6cb-fdb528837f22@freebsd.org> From: Vasily Postnicov Date: Thu, 10 Oct 2024 17:12:59 +0000 Message-ID: Subject: Re: Running Mezzano in bhyve To: Peter Grehan Cc: freebsd-virtualization@freebsd.org Content-Type: multipart/alternative; boundary="0000000000002882f20624227cb0" X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2a00:1450::/32, country:US] X-Rspamd-Queue-Id: 4XPbtf4ld9z3wbp X-Spamd-Bar: ---- --0000000000002882f20624227cb0 Content-Type: text/plain; charset="UTF-8" I was able to fix panics in both virtio and AHCI. This is what I found: 1) Virtio had a stupid bug, namely Mezzano tried to find an accessor to some IO port in the runtime doing something like (funcall (intern (format nil "~a-~a" bus-name slot-name)) ...). Surely, the creator made an error in the name of one of the accessors, so FUNCALL tried to call an unbound symbol, hence the page fault. 2) AHCI had the following code: ;; Magic hacks for Intel devices? ;; Set port enable bits in Port Control and Status on Intel controllers. (when (eql (pci:pci-config/16 location pci:+pci-config-vendorid+) #x8086) (let* ((n-ports (1+ (ldb (byte +ahci-CAP-NP-size+ +ahci-CAP-NP-position+) (ahci-global-register ahci +ahci-register-CAP+)))) (pcs (pci:pci-config/16 location #x92))) (setf (pci:pci-config/16 location #x92) (logior pcs (ash #xFF (- (- 8 n-ports))))))) I checked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports))) is 1044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORTS = 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this output: Detected AHCI ABAR at C1002000 AHCI IRQ is B Host Capabilities FF30FF25 Global Host Control 80000000 Interrupt Status 0 Ports Implemented 1 Version 10300 Command Completion Coalescing Control 0 Command Completion Coalescing Ports 0 Enclosure Management Location 0 Enclosure Management Control 0 Host Capabilities Extended 4 BIOS/OS Handoff Control and Status 0 AHCI HBA version 1.300 Handler: 0 Config register: 17 Port 0 Waiting for CR/FR to stop. Allocated port data at 105C33000 Command List at 105C33000 Received FIS at 105C33400 Command Tabl at 105C33500 Initializing device on port 0 Command List Base Address 5C33000 Command List Base Address Upper 32-bits 1 FIS Base Address 5C33400 FIS Base Address Upper 32-bits 1 Interrupt Status 0 Interrupt Enable 7D80003F Command and Status 1C017 Task File Data 50 Signature 101 SATA Status (SCR0: SStatus) 133 SATA Control (SCR2: SControl) 300 SATA Error (SCR1: SError) 0 SATA Active (SCR3: SActive) 0 Command Issue 0 SATA Notification (SCR4: SNotification) 0 FIS-based Switching Control 0 *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** Command completed. 105C33600: 28A20040 100000 0 3F 105C33610: 0 59564248 4644452D 2D413239 105C33620: 382D4136 39433646 0 30300000 105C33630: 20203120 42482020 45205956 54415341 105C33640: 49532044 20204B20 20202020 20202020 105C33650: 20202020 20202020 20202020 80802020 105C33660: B000000 4000 60000 0 105C33670: 0 0 A00000 70000 105C33680: 780003 780078 40200078 0 105C33690: 0 1F0000 40010E 0 105C336A0: 2803F0 74004068 40684000 4000B400 105C336B0: 7F 0 0 0 105C336C0: 0 0 A00000 0 105C336D0: 10000 6008 0 0 105C336E0: 0 0 0 40080000 105C336F0: 4008 0 0 0 105C33700: 0 0 0 0 105C33710: 0 0 0 0 105C33720: 0 0 0 0 105C33730: 0 0 0 0 105C33740: 0 0 0 0 105C33750: 10000 0 0 0 105C33760: 0 0 0 0 105C33770: 0 0 0 0 105C33780: 0 0 0 0 105C33790: 0 0 0 0 105C337A0: 40000000 0 0 0 105C337B0: 0 0 0 1020 105C337C0: 0 0 0 0 105C337D0: 0 0 0 0 105C337E0: 0 0 0 0 105C337F0: 0 0 0 78A50000 Features (83): 7400 Sector size: 200 Sector count: A00000 Serial: BHYVE-FD29-AA68-6F9C Model: BHYVE SATA DISK Registered new R/W disk #<149CAC9> sectors:A00000 Host Capabilities FF30FF25 Global Host Control 80000002 Interrupt Status 1 Ports Implemented 1 Version 10300 Command Completion Coalescing Control 0 Command Completion Coalescing Ports 0 Enclosure Management Location 0 Enclosure Management Control 0 Host Capabilities Extended 4 BIOS/OS Handoff Control and Status 0 PCI:0:0:0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF 40: Unknown capability 10 *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** Detected MBR style parition table on disk #<149CAC9> Detected partition 0 on disk #<149CAC9>. Start: 800 size: 800 Registered new R/W disk #<149CCD9> sectors:800 Detected partition 1 on disk #<149CAC9>. Start: 1000 size: 800 Registered new R/W disk #<149CD89> sectors:800 Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000 Registered new R/W disk #<149CE39> sectors:9FE000 Looking for paging disk with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 *** AHCI-RUN-COMMAND TIMEOUT EXPIRED! *** Found image with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on disk #<149CE39> Found boot image on disk #<149CE39>! BML4 at -7FFFFFEFD000 Store freelist block is 2 It seems it is booting, but very very slowly with those "TIMEOUT EXPIRED" messages. For virtio-blk, it's almost the same with an exception that it hangs completely. I'll try to investigate further. Meanwhile, can you make any suggestions why those magic intel AHCI controller hacks are required and why sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve? --0000000000002882f20624227cb0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I was able to fix panics in both virtio and AHCI. This is = what=C2=A0I found:

1) Virtio had a stupid bug, namely Me= zzano tried to find an accessor to some IO port in the runtime doing someth= ing like (funcall (intern (format nil "~a-~a" bus-name slot-name)= ) ...). Surely, the creator made an error in the name of one of the accesso= rs, so FUNCALL tried to call an unbound symbol, hence the page fault.
=
2) AHCI had the following code:

;; Magic hack= s for Intel devices?
;; Set port enable bits in Port Control and Status = on Intel controllers.
(when (eql (pci:pci-config/16 location pci:+= pci-config-vendorid+) #x8086)
=C2=A0 (let* ((n-ports (1+ (ldb (byte +ahc= i-CAP-NP-size+ +ahci-CAP-NP-position+)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(ahci-glo= bal-register ahci +ahci-register-CAP+))))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0(pcs (pci:pci-config/16 location #x92)))
=C2=A0 =C2=A0 (setf (pci:= pci-config/16 location #x92) (logior pcs
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 (ash #xFF (- (- 8 n-ports)))))))

I chec= ked the value of N-PORTS, it's 20, so (ash #xff (- (- 8 n-ports))) is= =C2=A01044480 which is bigger than 2^16-1. I recompiled bhyve with MAX_PORT= S =3D 6 in bhyve/pci_ahci.c and the panic disappeared. Now I have this outp= ut:

Detected AHCI ABAR at C1002000
AHCI IRQ is = B
Host Capabilities FF30FF25
Global Host Control 80000000
Interrup= t Status 0
Ports Implemented 1
Version 10300
Command Completion Co= alescing Control 0
Command Completion Coalescing Ports 0
Enclosure Ma= nagement Location 0
Enclosure Management Control 0
Host Capabilities = Extended 4
BIOS/OS Handoff Control and Status 0
AHCI HBA version 1.30= 0
Handler: 0
Config register: 17
Port 0
Waiting for CR/FR to st= op.
Allocated port data at 105C33000
Command List at 105C33000
Rec= eived FIS at 105C33400
Command Tabl at 105C33500
Initializing device = on port 0
=C2=A0Command List Base Address 5C33000
=C2=A0Command List = Base Address Upper 32-bits 1
=C2=A0FIS Base Address 5C33400
=C2=A0FIS= Base Address Upper 32-bits 1
=C2=A0Interrupt Status 0
=C2=A0Interrup= t Enable 7D80003F
=C2=A0Command and Status 1C017
=C2=A0Task File Data= 50
=C2=A0Signature 101
=C2=A0SATA Status (SCR0: SStatus) 133
=C2= =A0SATA Control (SCR2: SControl) 300
=C2=A0SATA Error (SCR1: SError) 0=C2=A0SATA Active (SCR3: SActive) 0
=C2=A0Command Issue 0
=C2=A0SAT= A Notification (SCR4: SNotification) 0
=C2=A0FIS-based Switching Control= 0
*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
Command completed.
10= 5C33600: 28A20040 100000 0 3F
105C33610: 0 59564248 4644452D 2D413239105C33620: 382D4136 39433646 0 30300000
105C33630: 20203120 42482020 45= 205956 54415341
105C33640: 49532044 20204B20 20202020 20202020
105C33= 650: 20202020 20202020 20202020 80802020
105C33660: B000000 4000 60000 0=
105C33670: 0 0 A00000 70000
105C33680: 780003 780078 40200078 0
1= 05C33690: 0 1F0000 40010E 0
105C336A0: 2803F0 74004068 40684000 4000B400=
105C336B0: 7F 0 0 0
105C336C0: 0 0 A00000 0
105C336D0: 10000 6008= 0 0
105C336E0: 0 0 0 40080000
105C336F0: 4008 0 0 0
105C33700: 0 = 0 0 0
105C33710: 0 0 0 0
105C33720: 0 0 0 0
105C33730: 0 0 0 0
= 105C33740: 0 0 0 0
105C33750: 10000 0 0 0
105C33760: 0 0 0 0
105C3= 3770: 0 0 0 0
105C33780: 0 0 0 0
105C33790: 0 0 0 0
105C337A0: 400= 00000 0 0 0
105C337B0: 0 0 0 1020
105C337C0: 0 0 0 0
105C337D0: 0 = 0 0 0
105C337E0: 0 0 0 0
105C337F0: 0 0 0 78A50000
Features (83): = 7400
Sector size: 200
Sector count: A00000
Serial: BHYVE-FD29-AA68= -6F9C
Model: BHYVE SATA DISK
Registered new R/W disk #<149CAC9>= sectors:A00000
Host Capabilities FF30FF25
Global Host Control 800000= 02
Interrupt Status 1
Ports Implemented 1
Version 10300
Command= Completion Coalescing Control 0
Command Completion Coalescing Ports 0Enclosure Management Location 0
Enclosure Management Control 0
Host= Capabilities Extended 4
BIOS/OS Handoff Control and Status 0
PCI:0:0= :0 1022:7432 NIL - NIL 6:0:0 rid: 0 hdr: 0 intr: FF
=C2=A0 =C2=A0 40: Un= known capability 10
*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
<= div>*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
Detected MBR st= yle parition table on disk #<149CAC9>
Detected partition 0 on disk= #<149CAC9>. Start: 800 size: 800
Registered new R/W disk #<149= CCD9> sectors:800
Detected partition 1 on disk #<149CAC9>. Star= t: 1000 size: 800
Registered new R/W disk #<149CD89> sectors:800Detected partition 2 on disk #<149CAC9>. Start: 2000 size: 9FE000Registered new R/W disk #<149CE39> sectors:9FE000
Looking for pa= ging disk with UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10
*** AHCI-RUN-COMMAND TIMEOUT EXPIRED! ***
Found image with = UUID 5C:F6:EE:79:2C:DF:45:E1:BA:2B:63:25:C4:1A:5F:10 on disk #<149CE39&g= t;
Found boot image on disk #<149CE39>!
BML4 at -7FFFFFEFD000Store freelist block is 2

It seems it is booting, but = very very slowly with those "TIMEOUT EXPIRED" messages. For virti= o-blk, it's almost the same with an exception=C2=A0that=C2=A0it hangs c= ompletely. I'll try to investigate further. Meanwhile, can you make any= suggestions why those magic intel AHCI controller hacks are required and w= hy sc->ports can get bigger than DEF_PORTS in pci_ahci_init in bhyve?
--0000000000002882f20624227cb0--