FYI: example "panic: ARM64TODO: reclaim_pv_chunk" on a Pine64+ 2GB with head -r317015 [mdconfig -d not effectively releasing memory?]
Mark Millard
markmi at dsl-only.net
Fri Apr 28 05:26:24 UTC 2017
[As the text does not really follow from the earlier text
I'd sent directly I'm top posting a hypothesis about where
so much active memory came from to be so low in available
memory to get an reclaim_pv_chunk attempt.]
My hypothesis is that the "mdconfig -d"s from vm_copy_base
(from /usr/src/release/tools/vmimage.subr ) did not actually
release the memory resources involved (from vnode backed
mdconfig use):
vm_copy_base() {
# Creates a new UFS root filesystem and copies the contents of the
# current root filesystem into it. This produces a "clean" disk
# image without any remnants of files which were created temporarily
# during image-creation and have since been deleted (e.g., downloaded
# package archives).
mkdir -p ${DESTDIR}/old
mdold=$(mdconfig -f ${VMBASE})
mount /dev/${mdold} ${DESTDIR}/old
truncate -s ${VMSIZE} ${VMBASE}.tmp
mkdir -p ${DESTDIR}/new
mdnew=$(mdconfig -f ${VMBASE}.tmp)
newfs -L rootfs /dev/${mdnew}
mount /dev/${mdnew} ${DESTDIR}/new
tar -cf- -C ${DESTDIR}/old . | tar -xUf- -C ${DESTDIR}/new
umount_loop /dev/${mdold}
rmdir ${DESTDIR}/old
mdconfig -d -u ${mdold}
umount_loop /dev/${mdnew}
rmdir ${DESTDIR}/new
tunefs -n enable /dev/${mdnew}
mdconfig -d -u ${mdnew}
mv ${VMBASE}.tmp ${VMBASE}
}
Without such prior mdconfig activity the
"cp -p" and following "xz -T 0" do not get the
large Meme:Active figure in top, "xz -T 0" getting
more like 781M Mem:Active with a xz:SIZE of 791M and
xz:RES < 800M (varying). Zero xz:SWAP. xz also gets
all the cores going, so well over 300% in top always
(4 cores) instead of < 100%. In this context both
the cp and the xz finish just fine.
In other words: no low memory problem without
first having the vnode backed mdconfig use.
From the prior top report for the failure,
partially repeated here:
. . .
Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
Swap: 6144M Total, 34M Used, 6110M Free, 348K Out
PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND
48988 root 4 31 0 651M 27048K 0K RUN 0 0:03 87.60% xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-
. . .
The combination 1618M Mem:Active but 34M Swap:Used
and 651M xz:SIZE but 27M xz:RES and 0K xz:SWAP just
seems very odd, like it should not happen. The 17M
Mem:Inact is odd for the context as well. (Mem:Wired,
Mem:Buf, and Mem:Free look normal.)
An alternate hypothesis would be the memory "leak"
is from mkimg not having it memory-use cleaned up.
This happens after vm_copy_base but before the cp/xz
sequence and is what produces vm.raw.
For reference of what worked just fine after the
post-panic reboot, using the already existing
vm.raw (sparse) file as a starting place:
# cp -p vm.raw /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw
# xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw
# ls -lTt *raw*
-rw-r--r-- 1 root wheel 34360566272 Apr 27 18:40:24 2017 vm.raw
-rw-r--r-- 1 root wheel 34359746560 Apr 27 18:37:29 2017 vm.raw.nested_bsd
-rw-r--r-- 1 root wheel 27917287424 Apr 27 18:34:45 2017 raw.img
# du -sm *raw*
1762 raw.img
1583 vm.raw
1583 vm.raw.nested_bsd
(Before the .xz replaces the .raw:)
# ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/
total 33820032
-rw-r--r-- 1 root wheel 34360566272 Apr 27 18:40:24 2017 FreeBSD-12.0-CURRENT-arm64-aarch64.raw
# du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/*
32777 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw
(After xz:)
# ls -lT /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/
total 258208
-rw-r--r-- 1 root wheel 264275808 Apr 27 18:40:24 2017 FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz
# du -sm /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/*
253 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw.xz
(Mem:Active returned to 10M when xz finished.)
Prior reports from capturing text:
On 2017-Apr-27, at 7:31 PM, Mark Millard <markmi at dsl-only.net> wrote:
> [Another example panic. Again no dump. But I have what
> a top -PCwaopid froze at this time.]
>
> On 2017-Apr-27, at 4:22 PM, Mark Millard <markmi at dsl-only.net> wrote:
>
>> Unfortunately for this FYI the attempt to produce a dump
>> failed and so all the information that I have is what I
>> first captured from the console output: a backtrace.
>>
>> The context was head -r317015 on a Pine64+ 2GB. At the
>> time I was experimenting with trying to build a vm.raw
>> from my own build of FreeBSD. The (root) file system
>> is on a USB SSD off of a powered USB hub.
>>
>> panic: ARM64TODO: reclaim_pv_chunk
>> cpuid = 1
>> time = 1493332968
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self_wrapper+0x28
>> pc = 0xffff000000605cc0 lr = 0xffff0000000869cc
>> sp = 0xffff000083ba4f00 fp = 0xffff000083ba5110
>>
>> db_trace_self_wrapper() at vpanic+0x164
>> pc = 0xffff0000000869cc lr = 0xffff00000031d464
>> sp = 0xffff000083ba5120 fp = 0xffff000083ba5190
>>
>> vpanic() at panic+0x4c
>> pc = 0xffff00000031d464 lr = 0xffff00000031d2fc
>> sp = 0xffff000083ba51a0 fp = 0xffff000083ba5220
>>
>> panic() at reclaim_pv_chunk+0x10
>> pc = 0xffff00000031d2fc lr = 0xffff00000061a234
>> sp = 0xffff000083ba5230 fp = 0xffff000083ba5230
>>
>> reclaim_pv_chunk() at get_pv_entry+0x240
>> pc = 0xffff00000061a234 lr = 0xffff000000616184
>> sp = 0xffff000083ba5240 fp = 0xffff000083ba5260
>>
>> get_pv_entry() at pmap_enter+0x694
>> pc = 0xffff000000616184 lr = 0xffff0000006156a0
>> sp = 0xffff000083ba5270 fp = 0xffff000083ba5300
>>
>> pmap_enter() at vm_fault_hold+0x28c
>> pc = 0xffff0000006156a0 lr = 0xffff0000005b9740
>> sp = 0xffff000083ba5310 fp = 0xffff000083ba5460
>>
>> vm_fault_hold() at vm_fault+0x70
>> pc = 0xffff0000005b9740 lr = 0xffff0000005b9464
>> sp = 0xffff000083ba5470 fp = 0xffff000083ba54a0
>>
>> vm_fault() at data_abort+0xe0
>> pc = 0xffff0000005b9464 lr = 0xffff00000061ad94
>> sp = 0xffff000083ba54b0 fp = 0xffff000083ba5560
>>
>> data_abort() at handle_el1h_sync+0x70
>> pc = 0xffff00000061ad94 lr = 0xffff000000607870
>> sp = 0xffff000083ba5570 fp = 0xffff000083ba5680
>>
>> handle_el1h_sync() at kern_select+0x9fc
>> pc = 0xffff000000607870 lr = 0xffff00000037db3c
>> sp = 0xffff000083ba5690 fp = 0xffff000083ba58f0
>>
>> kern_select() at sys_select+0x5c
>> pc = 0xffff00000037db3c lr = 0xffff00000037dc58
>> sp = 0xffff000083ba5900 fp = 0xffff000083ba5930
>>
>> sys_select() at do_el0_sync+0xa48
>> pc = 0xffff00000037dc58 lr = 0xffff00000061b91c
>> sp = 0xffff000083ba5940 fp = 0xffff000083ba5a70
>>
>> do_el0_sync() at handle_el0_sync+0x6c
>> pc = 0xffff00000061b91c lr = 0xffff0000006079e8
>> sp = 0xffff000083ba5a80 fp = 0xffff000083ba5b90
>>
>> handle_el0_sync() at 0x4948c
>> pc = 0xffff0000006079e8 lr = 0x000000000004948c
>> sp = 0xffff000083ba5ba0 fp = 0x0000ffffffffd960
>
>
> This time I got to record from top:
> (swap is on a swap partition)
> (pid 49888's SIZE vs. RES and SWAP might be interesting)
> (as might the Active figure)
>
> last pid: 48988; load averages: 0.64, 0.44, 0.38 up 0+04:21:01 19:19:50
> 32 processes: 2 running, 30 sleeping
> CPU 0: 13.2% user, 0.0% nice, 23.2% system, 0.3% interrupt, 63.3% idle
> CPU 1: 4.6% user, 0.0% nice, 23.9% system, 0.0% interrupt, 71.5% idle
> CPU 2: 2.1% user, 0.0% nice, 23.2% system, 0.0% interrupt, 74.8% idle
> CPU 3: 3.3% user, 0.0% nice, 23.8% system, 0.0% interrupt, 72.8% idle
> Mem: 1618M Active, 17M Inact, 315M Wired, 204M Buf, 15M Free
> Swap: 6144M Total, 34M Used, 6110M Free, 348K Out
>
> PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND
> 48988 root 4 31 0 651M 27048K 0K RUN 0 0:03 87.60% xz -T 0 /usr/obj/DESTDIRs/vmimage-aarch64/vmimages/FreeBSD-12.0-CURRENT-arm64-aarch64.raw
> 11983 root 1 22 0 5068K 0K 0K wait 3 0:00 0.00% make vm-image vm-install DESTDIR=/usr/obj/DESTDIRs/vmimage-aarch64 (<make>)
> 11981 root 1 42 0 7320K 0K 1516K wait 1 0:00 0.00% sh /root/sys_build_scripts.aarch64-host/make_noscript_aarch64_nodebug_clang_bootstrap-aarch64-host.sh vm-image vm-install
> 11980 root 1 20 0 6656K 1548K 0K select 0 0:02 0.00% [script]
> 11977 root 1 30 0 7320K 0K 1516K wait 3 0:00 0.00% /bin/sh /root/sys_build_scripts.aarch64-host/make_aarch64_nodebug_clang_bootstrap-aarch64-host.sh vm-image vm-install DEST
> 2694 root 1 20 0 8804K 2072K 0K CPU2 2 0:07 0.17% top -PCwaopid
> 827 root 1 20 0 7320K 0K 360K wait 0 0:00 0.00% su (<sh>)
> 826 markmi 1 22 0 10372K 0K 1532K wait 3 0:00 0.00% su (<su>)
> 820 markmi 1 24 0 7320K 0K 1516K wait 1 0:00 0.00% -sh (<sh>)
> 819 markmi 1 20 0 18416K 1152K 0K select 1 0:21 0.00% sshd: markmi at pts/1 (sshd)
> 816 root 1 20 0 18416K 3276K 0K select 0 0:00 0.00% sshd: markmi [priv] (sshd)
> 765 root 1 20 0 7320K 0K 224K wait 2 0:00 0.00% su (<sh>)
> 764 markmi 1 23 0 10372K 0K 1532K wait 0 0:00 0.00% su (<su>)
> 758 markmi 1 31 0 7320K 0K 1516K wait 1 0:00 0.00% -sh (<sh>)
> 757 markmi 1 20 0 18416K 228K 904K select 3 0:01 0.01% sshd: markmi at pts/0 (sshd)
> 754 root 1 25 0 18416K 3276K 0K select 1 0:00 0.00% sshd: markmi [priv] (sshd)
> 746 root 1 27 0 7320K 1532K 0K ttyin 0 0:00 0.00% -sh (sh)
> 745 root 1 20 0 10372K 0K 1532K wait 1 0:00 0.00% login [pam] (<login>)
> 700 root 1 20 0 6948K 0K 168K nanslp 1 0:00 0.00% /usr/sbin/cron -s (<cron>)
> 696 smmsp 1 20 0 10460K 0K 184K pause 0 0:00 0.00% sendmail: Queue runner at 00:30:00 for /var/spool/clientmqueue (<sendmail>)
> 693 root 1 20 0 10460K 1392K 0K select 1 0:00 0.03% sendmail: accepting connections (sendmail)
> 690 root 1 20 0 15800K 968K 0K select 2 0:00 0.00% /usr/sbin/sshd
> 661 root 1 20 0 6656K 344K 0K select 2 0:01 0.00% /usr/sbin/powerd
> 658 root 2 20 0 12788K 12672K 0K select 0 0:02 0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
> 620 root 32 52 0 6384K 1100K 0K rpcsvc 1 0:00 0.00% nfsd: server (nfsd)
> 619 root 1 52 0 6384K 704K 0K select 1 0:00 0.00% nfsd: master (nfsd)
> 617 root 1 20 0 6684K 688K 0K select 1 0:00 0.00% /usr/sbin/mountd -r
> 478 root 1 20 0 6676K 596K 0K select 3 0:00 0.00% /usr/sbin/rpcbind
> 469 root 1 20 0 6680K 572K 0K select 2 0:00 0.00% /usr/sbin/syslogd -s
> 396 root 1 20 0 9580K 32K 0K select 0 0:00 0.00% /sbin/devd
> 308 _dhcp 1 20 0 6800K 532K 0K select 2 0:00 0.00% dhclient: awg0 (dhclient)
> 307 root 1 52 0 6800K 424K 0K select 2 0:00 0.00% dhclient: awg0 [priv] (dhclient)
>
> And here is the backtrace:
>
> timeout stopping cpus
> panic: ARM64TODO: reclaim_pv_chunk
> cpuid = 0
> time = 1493345992
> KDB: stack backtrace:
> db_trace_self() at db_trace_self_wrapper+0x28
> pc = 0xffff000000605cc0 lr = 0xffff0000000869cc
> sp = 0xffff000083d301d0 fp = 0xffff000083d303e0
>
> db_trace_self_wrapper() at vpanic+0x164
> pc = 0xffff0000000869cc lr = 0xffff00000031d464
> sp = 0xffff000083d303f0 fp = 0xffff000083d30460
>
> vpanic() at panic+0x4c
> pc = 0xffff00000031d464 lr = 0xffff00000031d2fc
> sp = 0xffff000083d30470 fp = 0xffff000083d304f0
>
> panic() at reclaim_pv_chunk+0x10
> pc = 0xffff00000031d2fc lr = 0xffff00000061a234
> sp = 0xffff000083d30500 fp = 0xffff000083d30500
>
> reclaim_pv_chunk() at get_pv_entry+0x240
> pc = 0xffff00000061a234 lr = 0xffff000000616184
> sp = 0xffff000083d30510 fp = 0xffff000083d30530
>
> get_pv_entry() at pmap_enter+0x694
> pc = 0xffff000000616184 lr = 0xffff0000006156a0
> sp = 0xffff000083d30540 fp = 0xffff000083d305d0
>
> pmap_enter() at vm_fault_hold+0x28c
> pc = 0xffff0000006156a0 lr = 0xffff0000005b9740
> sp = 0xffff000083d305e0 fp = 0xffff000083d30730
>
> vm_fault_hold() at vm_fault+0x70
> pc = 0xffff0000005b9740 lr = 0xffff0000005b9464
> sp = 0xffff000083d30740 fp = 0xffff000083d30770
>
> vm_fault() at data_abort+0xe0
> pc = 0xffff0000005b9464 lr = 0xffff00000061ad94
> sp = 0xffff000083d30780 fp = 0xffff000083d30830
>
> data_abort() at handle_el0_sync+0x6c
> pc = 0xffff00000061ad94 lr = 0xffff0000006079e8
> sp = 0xffff000083d30840 fp = 0xffff000083d30950
>
> handle_el0_sync() at 0x400a3de4
> pc = 0xffff0000006079e8 lr = 0x00000000400a3de4
> sp = 0xffff000083d30960 fp = 0x0000ffffbfdfcd30
>
> KDB: enter: panic
> [ thread pid 48988 tid 100230 ]
> Stopped at kdb_enter+0x44: undefined d4200000
>
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-current
mailing list