A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot [backtraces added]
Mark Millard
markmi at dsl-only.net
Wed Dec 13 01:46:07 UTC 2017
[Just adding back traces. ffs_mount
use of uma_zcreate is involved in the
kernel thread, as is uma_reclaim_worker's
use of uma_reclaim_locked in the uma
thread.]
On 2017-Dec-12, at 4:20 PM, Mark Millard <markmi at dsl-only.net> wrote:
> On 2017-Dec-12, at 3:19 PM, Mark Millard <markmi at dsl-only.net> wrote:
>
>> On 2017-Dec-12, at 2:02 PM, Andreas Schwarz <freebsd.asc at strcmp.org> wrote:
>>
>>> On 12.12.17, Mark Millard wrote:
>>>
>>>> I initially jumped from -r326192 to -r326726 and
>>>> ended up with a rpi2 that would normally hang
>>>> somewhere around release APs being displayed.
>>>> (I have had a couple of completed boots but many
>>>> dozens of hung-up attempts.) Both a debug kernel
>>>> and a non-debug kernel hang the same way.
>>>>
>>>> Bisecting the kernel (holding world -r326726
>>>> constant) showed:
>>>>
>>>> -r326346 did not hang (nor did before)
>>>> -r326347 and later hung.
>>>
>>> JFYI, the latest kernel (and world) running at one of my
>>> RPI2-B is r326631, without any issues.
>>
>> Interesting. (By the way: My context
>> is with a V1.1 Cortex-A7 based rpi2,
>> not V1.2 and Cortex-A53.)
>>
>> I've almost always run the root file
>> system being on a USB SSD instead of
>> on mmcsd0 . I wonder if that is
>> somehow involved since it may be
>> unusual. UFS file system.
>>
>> The USB SSD is on a powered hub that
>> is in turn plugged into the rpi2.
>>
>> [I had the hang problem before the
>> following and after.]
>>
>> The mechanism for holding mmcsd0 in
>> failed recently but the ejection
>> mechanism still works. So I hold
>> in mmcsd0 until after I get a USB
>> SSD boot now. (Interrupt boot, unload,
>> boot/autoboot, picks up the kernel
>> from the USB SSD.)
>>
>> This means that I effectively can
>> not avoid the USB SSD any more
>> unless I get my hands on a different
>> V1.1 rpi2.
>
> Looks like I'll get my hands on a different
> rpi2 V1.1 in a few days. So I should then
> be able to do reasonable mmcsd0-only
> experiments. At least once I find the time.
>
> FYI, in case boot details are involved
> in reproducing the problem. . .
>
> On the mmcsd0 I have /boot/loader.conf with:
>
> kern.cam.boot_delay="10000"
> vfs.mountroot.timeout="10"
>
> and /etc/fstab with:
>
> /dev/ufs/RPI2rootfs / ufs rw,noatime 1 1
> /dev/label/RPI2Aswap none swap sw 0 0
> /dev/label/RPI2Aboot /boot/msdos msdosfs rw,noatime 0 0
>
> where the /dev/ufs/RPI2rootfs was the USB SSD.
>
> However, I interrupt the loader and unload and
> then boot or autoboot. (But the hangs started
> before this extra sequence was involved.)
>
> On the USB SSD I have /boot/loader.conf with:
>
> kern.cam.boot_delay="10000"
> vfs.mountroot.timeout="10"
>
> and /etc/fstab with:
>
> /dev/da0p1 / ufs rw,noatime 1 1
> /dev/da0p2 none swap sw 0 0
>
>
> What db> showed does point to things that
> -r326347 involve:
>
> chain 32:
> thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
> thread 100077 (pid 23, uma) is on a run queue
>
> But for all I know -r326347 could be depending
> on something required to be true but not correct
> elsewhere in the rpi2 support. I'm not claiming
> that -r326347 is wrong, just that it is involved.
> I've way to little knowledge to claim to know
> what is wrong on the evidence that I have.
>
> I've not yet tried a bpi-m3 Cortex-A7 context or
> a pine64+ 2GB or rpi3 Cortex-A53 context. Nor
> powerpc64 nor powerpc. At some point I'll get the
> time for one or more of these. I've not had amd64
> problems in this area.
>
> I may not be able to test the bpi-m3: its barrel
> connector for power seems flaky and it is
> difficult to keep the board powered for long
> periods in recent times.
For this new example -r326347 kernel based
boot-hang:
chain 35:
thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
thread 100073 (pid 23, uma) is on a run queue
db> bt
Tracing pid 1 tid 100001 td 0xd429f000
cpu_switch() at cpu_switch+0x18
pc = 0xc0584f80 lr = 0xc0299ad8 (sched_switch+0x5c0)
sp = 0xd42a56b8 fp = 0xd42a56e8
sched_switch() at sched_switch+0x5c0
pc = 0xc0299ad8 lr = 0xc0275f28 (mi_switch+0x258)
sp = 0xd42a56f0 fp = 0xd42a5718
r4 = 0x0001f0f5 r5 = 0x00000000
r6 = 0xd429f000 r7 = 0x0006612d
r8 = 0x00000000 r9 = 0x00000104
r10 = 0xc08073f0
mi_switch() at mi_switch+0x258
pc = 0xc0275f28 lr = 0xc02c1508 (sleepq_switch+0x1c0)
sp = 0xd42a5720 fp = 0xd42a5748
r4 = 0xd429f000 r5 = 0xc0947dc4
r6 = 0xc09a9c68 r7 = 0xc0947dc0
r8 = 0x00000000 r9 = 0xc09440c0
r10 = 0x000000f4
sleepq_switch() at sleepq_switch+0x1c0
pc = 0xc02c1508 lr = 0xc02c1308 (sleepq_wait+0x48)
sp = 0xd42a5750 fp = 0xd42a5760
r4 = 0xd429f000 r5 = 0x00000000
r6 = 0xc09a9c68 r7 = 0xc06b415c
r8 = 0x00000001 r9 = 0xc09a9c78
r10 = 0x00000000
sleepq_wait() at sleepq_wait+0x48
pc = 0xc02c1308 lr = 0xc02748b0 (_sx_slock_hard+0x298)
sp = 0xd42a5768 fp = 0xd42a57c8
r4 = 0xc09a9c68 r5 = 0xc0823ac0
r6 = 0xc06ae146 r7 = 0x00000000
_sx_slock_hard() at _sx_slock_hard+0x298
pc = 0xc02748b0 lr = 0xc027457c (_sx_slock_int+0x140)
sp = 0xd42a57d0 fp = 0xd42a57f8
r4 = 0x00000078 r5 = 0x00000765
r6 = 0xc09a9c68 r7 = 0x00000765
r8 = 0xc09a9c78 r9 = 0xd7ad8740
r10 = 0x00000000
_sx_slock_int() at _sx_slock_int+0x140
pc = 0xc027457c lr = 0xc052b024 (uma_zcreate+0x10c)
sp = 0xd42a5800 fp = 0xd42a5850
r4 = 0x00000078 r5 = 0xc09a9c68
r6 = 0xc06df86e r7 = 0xc06dd153
r8 = 0x00000000 r9 = 0x00000000
r10 = 0x00000000
uma_zcreate() at uma_zcreate+0x10c
pc = 0xc052b024 lr = 0xc050cc90 (ffs_mount+0x80)
sp = 0xd42a5858 fp = 0xd42a5980
r4 = 0xd7844000 r5 = 0x00000000
r6 = 0x00000003 r7 = 0xc09a99c0
r8 = 0x00000000 r9 = 0xd429f000
r10 = 0xd828c400
ffs_mount() at ffs_mount+0x80
pc = 0xc050cc90 lr = 0xc032e1d4 (vfs_donmount+0xeec)
sp = 0xd42a5988 fp = 0xd42a5b38
r4 = 0xffffffff r5 = 0xd74ec120
r6 = 0xd429f000 r7 = 0x00000000
r8 = 0x00000000 r9 = 0xc425acd0
r10 = 0xd828c400
vfs_donmount() at vfs_donmount+0xeec
pc = 0xc032e1d4 lr = 0xc0330ea8 (kernel_mount+0x70)
sp = 0xd42a5b40 fp = 0xd42a5b78
r4 = 0xd82a9000 r5 = 0x00000000
r6 = 0x00004000 r7 = 0x00000000
r8 = 0xd87e2050 r9 = 0xd87e2040
r10 = 0x00000000
kernel_mount() at kernel_mount+0x70
pc = 0xc0330ea8 lr = 0xc03335dc (parse_mount+0x458)
sp = 0xd42a5b80 fp = 0xd42a5c60
r4 = 0xd4263500 r5 = 0xd87e2054
r6 = 0xd82a9000 r7 = 0x00000000
parse_mount() at parse_mount+0x458
pc = 0xc03335dc lr = 0xc0331a44 ($a.2+0x28)
sp = 0xd42a5c68 fp = 0xd42a5dc8
r4 = 0xd7845018 r5 = 0x00000000
r6 = 0xd7845018 r7 = 0xd78404da
r8 = 0xc06bfbd4 r9 = 0xfffffff7
r10 = 0xd7836380
$a.2() at $a.2+0x28
pc = 0xc0331a44 lr = 0xc020a82c (start_init+0x5c)
sp = 0xd42a5dd0 fp = 0xd42a5e20
r4 = 0xc06a370c r5 = 0xc0823ad0
r6 = 0x00000000 r7 = 0x00000000
r8 = 0xd42a5e48 r9 = 0x00000000
r10 = 0xd429b000
start_init() at start_init+0x5c
pc = 0xc020a82c lr = 0xc0230f10 (fork_exit+0xa0)
sp = 0xd42a5e28 fp = 0xd42a5e40
r4 = 0xd429f000 r5 = 0xd429b000
r6 = 0xc020a7d0 r7 = 0x00000000
r8 = 0xd42a5e48 r9 = 0x00000000
r10 = 0x00000000
fork_exit() at fork_exit+0xa0
pc = 0xc0230f10 lr = 0xc0564c8c (swi_exit)
sp = 0xd42a5e48 fp = 0x00000000
r4 = 0xc020a7d0 r5 = 0x00000000
r6 = 0x00000000 r7 = 0x00000000
r8 = 0x00000000 r10 = 0x00000000
swi_exit() at swi_exit
pc = 0xc0564c8c lr = 0xc0564c8c (swi_exit)
sp = 0xd42a5e48 fp = 0x00000000
db> bt
Tracing pid 23 tid 100073 td 0xd7ad8740
cpu_switch() at cpu_switch+0x18
pc = 0xc0584f80 lr = 0xc0299ad8 (sched_switch+0x5c0)
sp = 0xd87dad58 fp = 0xd87dad88
sched_switch() at sched_switch+0x5c0
pc = 0xc0299ad8 lr = 0xc0275f28 (mi_switch+0x258)
sp = 0xd87dad90 fp = 0xd87dadb8
r4 = 0x00000025 r5 = 0x00000000
r6 = 0xd7ad8740 r7 = 0x000240ea
r8 = 0x00000000 r9 = 0x00000100
r10 = 0xc08073f0
mi_switch() at mi_switch+0x258
pc = 0xc0275f28 lr = 0xc052d9f8 (uma_reclaim_locked+0x200)
sp = 0xd87dadc0 fp = 0xd87dade8
r4 = 0x00000003 r5 = 0xc08073f0
r6 = 0xc08073f0 r7 = 0x00000000
r8 = 0x00000000 r9 = 0xc0824310
r10 = 0xc06df86e
uma_reclaim_locked() at uma_reclaim_locked+0x200
pc = 0xc052d9f8 lr = 0xc052de70 (uma_reclaim_worker+0x4c)
sp = 0xd87dadf0 fp = 0xd87dae20
r4 = 0xc09a9c68 r5 = 0xc4242d80
r6 = 0xc06df86e r7 = 0x00000000
r8 = 0x00000100 r9 = 0xc09b3d08
r10 = 0xc09a9c7c
uma_reclaim_worker() at uma_reclaim_worker+0x4c
pc = 0xc052de70 lr = 0xc0230f10 (fork_exit+0xa0)
sp = 0xd87dae28 fp = 0xd87dae40
r4 = 0xd7ad8740 r5 = 0xd70ce000
r6 = 0xc052de24 r7 = 0x00000000
r8 = 0xd87dae48 r9 = 0xd7ada3a0
r10 = 0xc0824e8c
fork_exit() at fork_exit+0xa0
pc = 0xc0230f10 lr = 0xc0564c8c (swi_exit)
sp = 0xd87dae48 fp = 0x00000000
r4 = 0xc052de24 r5 = 0x00000000
r6 = 0x7ff6d83f r7 = 0xd6f583a0
r8 = 0xc0824dcc r10 = 0xc0824e8c
swi_exit() at swi_exit
pc = 0xc0564c8c lr = 0xc0564c8c (swi_exit)
sp = 0xd87dae48 fp = 0x00000000
db> show thread 100001
Thread 100001 at 0xd429f000:
proc (pid 1): 0xd429b000
name: kernel
stack: 0xd42a4000-0xd42a5fff
flags: 0x4 pflags: 0x20000000
state: INHIBITED: {SLEEPING}
wmesg: umadrain wchan: 0xc09a9c68 sleeptimo 0. 0 (curr 26. e3fd787f00000000)
priority: 84
container lock: sleepq chain (0xc0947dc4)
last voluntary switch: 26523 ms ago
last involuntary switch: 26695 ms ago
db> show thread 100073
Thread 100073 at 0xd7ad8740:
proc (pid 23): 0xd70ce000
name: uma
stack: 0xd87d9000-0xd87dafff
flags: 0x4 pflags: 0x200000
state: RUNQ
priority: 84
container lock: sched lock 3 (0xc0942ec0)
last voluntary switch: 26694 ms ago
db> show lock 0xc0942ec0
class: spin mutex
name: sched lock 3
flags: {SPIN, RECURSE}
state: {UNOWNED}
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-arm
mailing list