[Bug 278233] PHYS_IN_DMAP and VIRT_IN_DMAP macros assume contiguous DMAP memory

From: <bugzilla-noreply_at_freebsd.org>
Date: Sun, 07 Apr 2024 13:00:47 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=278233

            Bug ID: 278233
           Summary: PHYS_IN_DMAP and VIRT_IN_DMAP macros assume contiguous
                    DMAP memory
           Product: Base System
           Version: Unspecified
          Hardware: arm64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: arm
          Assignee: freebsd-arm@FreeBSD.org
          Reporter: zeev@amazon.com

Created attachment 249802
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=249802&action=edit
bug fix: support non-contiguous memory in PHYS_IN_DMAP and VIRT_IN_DMAP

The issue was observed on r8g.metal-48xl which is a coherent Graviton4-based
system with 2 SoCs.

ENA driver crashes when trying to remap BAR2 memory with WC attributes:

ena0: Elastic Network Adapter (ENA)ena v2.7.0
panic: Invalid DMAP table level: 0

cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x38
vpanic() at vpanic+0x1a4
panic() at panic+0x48
pmap_change_props_locked() at pmap_change_props_locked+0x6dc
pmap_change_props_locked() at pmap_change_props_locked+0x550
pmap_change_attr() at pmap_change_attr+0x5c
ena_attach() at ena_attach+0x264
device_attach() at device_attach+0x3fc
device_probe_and_attach() at device_probe_and_attach+0x80
bus_generic_attach() at bus_generic_attach+0x1c
pci_attach() at pci_attach+0xec
...


The memory map on those systems has 2 large DRAM regions, one for each SoC. The
regions are non contiguous and IO memory that includes the PCI BARs for the 2nd
SoC is between those 2 regions. So the memory map looks like following

DRAM region 0: 0x10000000000-0x1c000000000
IO memory(SoC1): 0x41000000000-0x050000000000
DRAM region 1: 0x50000000000-0x5c000000000


PHYS_IN_DMAP and VIRT_IN_DMAP macros in arm64 vmparam.h that inherently assume
that the DMAP region is contiguous with current implementation:

/* True if pa is in the dmap range */
#define PHYS_IN_DMAP(pa)        ((pa) >= DMAP_MIN_PHYSADDR && \
    (pa) < DMAP_MAX_PHYSADDR)

 /* True if va is in the dmap range */
 #define        VIRT_IN_DMAP(va)        ((va) >= DMAP_MIN_ADDRESS && \
     (va) < (dmap_max_addr))

This causes the following check in sys/arm64/arm64/pmap.c,
pmap_change_props_locked() to wrongly conclude that PCI memory is part of DMAP:
if (!VIRT_IN_DMAP(tmpva) && PHYS_IN_DMAP(pa)) {
  /*
  * Keep the DMAP memory in sync.
  */
  rv = pmap_change_props_locked(
       PHYS_TO_DMAP(pa), pte_size,
       prot, mode, true);
  if (rv != 0)
       return (rv);
}
And consequent lookup of this memory in DMAP page tables fails, as PCI memory
is not mapped in DMAP.

Attached a proposed patch that stores a list of DMAP regions and changes the
macros above to functions that check the address vs the list.
(Based on https://reviews.freebsd.org/D3538)
The instance boots properly after applying this patch.

-- 
You are receiving this mail because:
You are the assignee for the bug.