After more than 59 hr 20 min of poudreire based port building, the Rock64 (4 GiByte) got a data_abort with a panic message that mentioned "vm_fault failed"

Wed Nov 27 17:31:39 UTC 2019

The failure was while dwmmc_intr was active on the bus. It looks
like the vm_fault failed address matches the elr value, which is
near the lr value and near the "pc =" value listed for dwmmc_intr.
(Back trace shown later.)

This is a head -r355027 based context.

This does not look easy to reproduce.

I had poudriere running 4 jobs, each allowed to use 4 processes,
so the bulk of the time the load average was between 8 and 17.

The last top update (of my extended top) showed top never saw
significant swap usage:

Swap: 4608M Total, 22M Used, 4586M Free, 32M MaxObsUsed

("MaxObs" is short for "Maximum Observed".)

It also showed (line wrapped by me):

Mem: 196M Active, 1078M Inact, 4272K Laundry, 650M Wired, 264M Buf,
2035M Free, 2517M MaxObsActive, 805M MaxObsWired, 3219M MaxObs(Act+Wir)

It showed as running:

/usr/local/sbin/pkg-static create -r /wrkdirs/usr/ports/devel/llvm90/work/stage . . .
(earlier llvm80 had completed fine)

and 3 of processes the form:

cpdup -i0 -x ref0?

Those 3 seem to be for the 3 "Building"s listed below:

[59:20:56] [02] [00:14:53] Finished devel/qt5-linguist | qt5-linguist-5.13.2: Success
[59:20:57] [02] [00:00:00] Building deskutils/lumina-archiver | lumina-archiver-1.5.0
[59:20:57] [03] [00:00:00] Building deskutils/lumina-calculator | lumina-calculator-1.5.0
[59:20:57] [04] [00:00:00] Building x11/lumina-core | lumina-core-1.5.0

The serial console's report was:

Fatal data abort:
  x0: fffffd0000b45b00
  x1: ffff000040588000
  x2:               8c
  x3:              100
  x4: ffff00004035caa0
  x5: ffff00004035c7b0
  x6:                0
  x7:                1
  x8: ffff000000758ebc
  x9: ffff000000a33100
 x10: fffffd0000a28678
 x11:                0
 x12:         9633b10b
 x13:             2af8
 x14:             2777
 x15:             2af8
 x16:               38
 x17:               38
 x18: ffff00004035c870
 x19: fffffd0000a28600
 x20:               8c
 x21: fffffd0000b45e58
 x22: ffff000000a4b000
 x23:                0
 x24: fffffd0000b45e10
 x25: fffffd0000b89514
 x26: fffffd0000b8f180
 x27: fffffd0000b45e00
 x28: ffff000000a4bd98
 x29: ffff00004035c8b0
  sp: ffff00004035c870
  lr: ffff00000078e518
 elr: ffff00000078e51c
spsr:              145
 far:               28
 esr:         96000005
panic: vm_fault failed: ffff00000078e51c
cpuid = 2
time = 1574872496
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
         pc = 0xffff00000075ba9c  lr = 0xffff0000001066a8
         sp = 0xffff00004035c270  fp = 0xffff00004035c480

db_trace_self_wrapper() at vpanic+0x18c
         pc = 0xffff0000001066a8  lr = 0xffff00000041903c
         sp = 0xffff00004035c490  fp = 0xffff00004035c530

vpanic() at panic+0x44
         pc = 0xffff00000041903c  lr = 0xffff000000418eac
         sp = 0xffff00004035c540  fp = 0xffff00004035c5c0

panic() at data_abort+0x1e0
         pc = 0xffff000000418eac  lr = 0xffff000000777d94
         sp = 0xffff00004035c5d0  fp = 0xffff00004035c680

data_abort() at do_el1h_sync+0x144
         pc = 0xffff000000777d94  lr = 0xffff000000776fb0
         sp = 0xffff00004035c690  fp = 0xffff00004035c6c0

do_el1h_sync() at handle_el1h_sync+0x78
         pc = 0xffff000000776fb0  lr = 0xffff00000075e078
         sp = 0xffff00004035c6d0  fp = 0xffff00004035c7e0

handle_el1h_sync() at dwmmc_intr+0x280
         pc = 0xffff00000075e078  lr = 0xffff00000078e514
         sp = 0xffff00004035c7f0  fp = 0xffff00004035c8b0

dwmmc_intr() at ithread_loop+0x1f4
         pc = 0xffff00000078e514  lr = 0xffff0000003db604
         sp = 0xffff00004035c8c0  fp = 0xffff00004035c940

ithread_loop() at fork_exit+0x90
         pc = 0xffff0000003db604  lr = 0xffff0000003d7be4
         sp = 0xffff00004035c950  fp = 0xffff00004035c980

fork_exit() at fork_trampoline+0x10
         pc = 0xffff0000003d7be4  lr = 0xffff000000776cec
         sp = 0xffff00004035c990  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 12 tid 100038 ]
Stopped at      dwmmc_intr+0x288:       ldr     x8, [x23, #40]
db> 

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)