A head kernel rpi2 boot-hang bisected: -r326346 good; -r326347 and later hangs-up during boot
Mark Millard
markmi at dsl-only.net
Tue Dec 12 11:10:44 UTC 2017
I initially jumped from -r326192 to -r326726 and
ended up with a rpi2 that would normally hang
somewhere around release APs being displayed.
(I have had a couple of completed boots but many
dozens of hung-up attempts.) Both a debug kernel
and a non-debug kernel hang the same way.
Bisecting the kernel (holding world -r326726
constant) showed:
-r326346 did not hang (nor did before)
-r326347 and later hung.
Conclusion: -r326347 changes are involved in the
boot hangups.
Context: head -r326726 based (from before I did
the bisect). My knowledge is limited so I may not
have picked optimal material to get from the db>
prompt.
It appears that the messages about "random" occur
during the hang (indefinite wait). As I remember,
I have seen examples where the "Trying to mount"
did not show up --but normally it did.
And example from a hung boot:
. . .
Release APs
Trying to mount root from ufs:/dev/ufs/RPI2rootfs [rw,noatime]...
random: unblocking device.
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
arc4random: no preloaded entropy cache
The hang never seems to time out but just sits
there, even for hours.
It allows the <CR>~^B sequence to get to the db>
prompt. I've looked around a little a couple of
times.
One common point is that show allchains has
everything listed as sleeping except:
chain 32:
thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
thread 100077 (pid 23, uma) is on a run queue
chain 33:
(Note: uma's thread number has varied, as has the one
for [pagedaemon].)
bd> ps
pid ppid pgrp uid state wmesg wchan cmd
28 0 0 0 DL syncer 0xc095a31c [syncer]
27 0 0 0 DL vlruwt 0xd7417730 [vnlru]
26 0 0 0 DL psleep 0xc0959c00 [bufdaemon]
25 0 0 0 RL [bufspacedaemon]
24 0 0 0 DL psleep 0xc095e8f8 [vmdaemon]
23 0 0 0 RL (threaded) [pagedaemon]
100065 RunQ [pagedaemon]
100076 D launds 0xc095e804 [laundry: dom0]
100077 RunQ [uma]
. . .
1 0 0 0 DL umadrai 0xc095e528 [kernel]
0 0 0 0 RLs (threaded) [kernel]
100000 D swapin 0xc09673c8 [swapper]
. . .
100072 D - 0xd75b7e80 [if_io_tqg_1]
100073 RunQ [if_io_tqg_2]
100074 D - 0xd75b7d80 [if_io_tqg_3]
100075 D - 0xd75b7d00 [if_config_tqg_0]
100078 D - 0xd83dc100 [softirq_0]
100079 D - 0xd75b7c00 [softirq_1]
100080 RunQ [softirq_2]
100081 D - 0xd75b7b00 [softirq_3]
(Which if_io_tqg_<?> and softirq_<?> pair has RunQ varies.)
All RunQ's are shown above. One or two [idle: CPU<?>]'s have
state CanRun and the other [idle: CPU<?>]'s have state RUN.
They are the only items with those states. Example from the
same hangup:
10 0 0 0 RL (threaded) [idle]
100002 Run CPU 0 [idle: cpu0]
100003 Run CPU 1 [idle: cpu1]
100004 CanRun [idle: cpu2]
100005 CanRun [idle: cpu3]
db> show lock 0xc095e528
class: sx
name: umadrain
state: XLOCK: 0xd6cbe740 (tid 100077, pid 23, "uma")
waiters: shared
db> show thread 100001
Thread 100001 at 0xd40a7000:
proc (pid 1): 0xd40a3000
name: kernel
stack: 0xd40ac000-0xd40adfff
flags: 0x4 pflags: 0x20000000
state: INHIBITED: {SLEEPING}
wmesg: umadrain wchan: 0xc095e528 sleeptimo 0. 0 (curr 51d. 5eac6a0400000000)
priority: 84
container lock: sleepq chain (0xc0957244)
last voluntary switch: 1297717 ms ago
last involuntary switch: 1297809 ms ago
db> show thread 100077
Thread 100077 at 0xd6cbe740:
proc (pid 23): 0xd6cab000
name: uma
stack: 0xd83ca000-0xd83cbfff
flags: 0x4 pflags: 0x200000
state: RUNQ
priority: 84
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297815 ms ago
last involuntary switch: 1297815 ms ago
db> show thread 100073
Thread 100073 at 0xd7406740:
proc (pid 0): 0xc09673c8
name: if_io_tqg_2
stack: 0xd742a000-0xd742bfff
flags: 0x4 pflags: 0x200000
state: RUNQ
priority: 24
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297818 ms ago
db> show thread 100080
Thread 100080 at 0xd7431ae0:
proc (pid 0): 0xc09673c8
name: softirq_2
stack: 0xd83f5000-0xd83f6fff
flags: 0x4 pflags: 0x200000
state: RUNQ
priority: 24
container lock: sched lock 2 (0xc0952640)
last voluntary switch: 1297816 ms ago
db> show lock 0xc0952640
class: spin mutex
name: sched lock 2
flags: {SPIN, RECURSE}
state: {UNOWNED}
db> show lock 0xc0957244
class: spin mutex
name: sleepq chain
flags: {SPIN, RECURSE}
state: {UNOWNED}
db> show thread 100065
Thread 100065 at 0xd6cbb000:
proc (pid 23): 0xd6cab000
name: pagedaemon
stack: 0xd7403000-0xd7404fff
flags: 0x14 pflags: 0x20200000
state: RUNQ
priority: 84
container lock: sched lock 1 (0xc0951f80)
last voluntary switch: 1029 ms ago
last involuntary switch: 28606 ms ago
db> show lock 0xc0951f80
class: spin mutex
name: sched lock 1
flags: {SPIN, RECURSE}
state: {UNOWNED}
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-arm
mailing list