rpi2 head -r327485 (e.g.): rpi2 leaves one "CPU n" always idle for some boots
Mark Millard
markmi at dsl-only.net
Tue Jan 2 20:27:55 UTC 2018
[It looks like the opensll speed on problem cpuN (2)
can not get ^C and the like.
On 2018-Jan-2, at 12:13 PM, Mark Millard <markmi at dsl-only.net> wrote:
> On 2018-Jan-2, at 12:01 PM, Allan Jude <allanjude at freebsd.org> wrote:
>
>> On 2018-01-02 14:48, Mark Millard wrote:
>>> I've seen this over many versions of head for months
>>> but have never managed to find a way to force it to
>>> happen. It just shows up once and a while.
>>>
>>> Thus, I'm just dumping out some top and kernel information
>>> here for reference. I've used:
>>>
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>>
>>> to give the rpi2 4 active processes. Various outputs
>>> are from different times without a reboot between.
>>>
>>> top -CaePores shows the likes of:
>>>
>>> PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND
>>> 614 root 1 20 0 10452K 10480K 0K select 1 0:00 0.03% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
>>> 661 root 1 52 0 9984K 6132K 0K select 1 0:00 0.00% /usr/sbin/sshd
>>> 751 root 1 101 0 7256K 4276K 0K RUN 1 0:28 99.57% openssl speed
>>> 750 root 1 100 0 7256K 4276K 0K CPU0 0 0:32 94.83% openssl speed
>>> 753 root 1 86 0 7256K 4276K 0K RUN 3 0:13 52.36% openssl speed
>>> 752 root 1 86 0 7256K 4276K 0K CPU3 3 0:14 46.54% openssl speed
>>> 363 root 1 20 0 6428K 3840K 0K select 3 0:00 0.00% /sbin/devd
>>> . . .
>>>
>>> and:
>>>
>>> last pid: 754; load averages: 3.70, 2.38, 1.58 up 0+00:16:50 01:59:37
>>> 21 processes: 5 running, 16 sleeping
>>> CPU 0: 94.9% user, 0.0% nice, 0.0% system, 5.1% interrupt, 0.0% idle
>>> CPU 1: 99.6% user, 0.0% nice, 0.0% system, 0.4% interrupt, 0.0% idle
>>> CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
>>> CPU 3: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
>>> Mem: 12M Active, 1136K Inact, 56M Wired, 30M Buf, 722M Free
>>> Swap: 1536M Total, 6M Free
>>>
>>> From problem boot to problem boot, the CPU that stays
>>> idle has varied but usually has been CPU 2. I've never
>>> seen 2 or more stuck in idle.
>>>
>>> show allpcpu shows the likes of:
>>>
>>> db> show allpcpu
>>> Current CPU: 0
>>>
>>> cpuid = 0
>>> dynamic pcpu = 0x3d2540
>>> curthread = 0xd8478ae0: pid 2032 tid 100150 "openssl"
>>> curpcb = 0xd852ae98
>>> fpcurthread = 0xd8478ae0: pid 2032 "openssl"
>>> idlethread = 0xc376fae0: tid 100002 "idle: cpu0"
>>> curpmap = 0xd8e43bf4
>>> curvnet = 0
>>>
>>> cpuid = 1
>>> dynamic pcpu = 0x3998540
>>> curthread = 0xd7e5b3a0: pid 2031 tid 100173 "openssl"
>>> curpcb = 0xda7e0e98
>>> fpcurthread = 0xd7e5b3a0: pid 2031 "openssl"
>>> idlethread = 0xc376f740: tid 100003 "idle: cpu1"
>>> curpmap = 0xd8e43ec4
>>> curvnet = 0
>>>
>>> cpuid = 2
>>> dynamic pcpu = 0x3999540
>>> curthread = 0xc376f3a0: pid 10 tid 100004 "idle: cpu2"
>>> curpcb = 0xc378ae98
>>> fpcurthread = none
>>> idlethread = 0xc376f3a0: tid 100004 "idle: cpu2"
>>> curpmap = 0
>>> curvnet = 0
>>>
>>> cpuid = 3
>>> dynamic pcpu = 0x399a540
>>> curthread = 0xd8477000: pid 2034 tid 100167 "openssl"
>>> curpcb = 0xd876de98
>>> fpcurthread = 0xd8477000: pid 2034 "openssl"
>>> idlethread = 0xc376f000: tid 100005 "idle: cpu3"
>>> curpmap = 0xc377ab04
>>> curvnet = 0
>>>
>>> In other words: it appears that the cpuN (here cpu2) is
>>> left with idle scheduled all the time for some reason.
>>>
>>> ps from db> shows things like:
>>>
>>>
>>> db> ps
>>> pid ppid pgrp uid state wmesg wchan cmd
>>> 2034 714 2034 0 R+ openssl
>>> 2033 714 2033 0 R+ CPU 3 openssl
>>> 2032 714 2032 0 R+ CPU 0 openssl
>>> 2031 714 2031 0 R+ CPU 1 openssl
>>>
>>> (then later:)
>>>
>>> db> ps
>>> pid ppid pgrp uid state wmesg wchan cmd
>>> 2034 714 2034 0 R+ CPU 3 openssl
>>> 2033 714 2033 0 R+ openssl
>>> 2032 714 2032 0 R+ CPU 0 openssl
>>> 2031 714 2031 0 R+ CPU 1 openssl
>>>
>>> There is also:
>>>
>>> 10 0 0 0 RL (threaded) [idle]
>>> 100002 CanRun [idle: cpu0]
>>> 100003 CanRun [idle: cpu1]
>>> 100004 CanRun [idle: cpu2]
>>> 100005 CanRun [idle: cpu3]
>>>
>>>
>>> These are from:
>>>
>>> # uname -apKU
>>> FreeBSD rpi2 12.0-CURRENT FreeBSD 12.0-CURRENT r327485M arm armv7 1200054 1200054
>>>
>>>
>>> ===
>>> Mark Millard
>>> markmi at dsl-only.net
>>>
>>> _______________________________________________
>>> freebsd-hackers at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>>>
>>
>> Have you tried using 'cpuset' to force each of the 4 to run on a
>> specific CPU? What happens if you force it to run on cpu #2?
>
> # cpuset -c -l 2 openssl speed 1>/dev/null 2>&1 &
>
> results in:
>
> last pid: 2120; load averages: 2.95, 2.44, 2.25 up 0+10:26:07 12:08:54
> 18 processes: 2 running, 16 sleeping
> CPU 0: 0.0% user, 0.0% nice, 0.0% system, 4.7% interrupt, 95.3% idle
> CPU 1: 0.4% user, 0.0% nice, 0.0% system, 0.0% interrupt, 99.6% idle
> CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
> CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
> Mem: 1852K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 53M Free
> Swap: 1536M Total, 1536M Free
>
> PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND
> 2120 root 1 20 0 6432K 2872K 0K CPU1 1 0:00 0.22% top -CawPopid
> 2118 root 1 29 0 7256K 4432K 0K RUN 1 0:00 0.00% openssl speed
> . . .
>
>
> With the other 3 also assigned:
>
> last pid: 2138; load averages: 4.45, 3.04, 2.51 up 0+10:28:48 12:11:35
> 21 processes: 5 running, 16 sleeping
> CPU 0: 93.7% user, 0.0% nice, 1.6% system, 4.7% interrupt, 0.0% idle
> CPU 1: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
> CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
> CPU 3: 100% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
> Mem: 3632K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 51M Free
> Swap: 1536M Total, 1536M Free
>
> PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND
> 2138 root 1 20 0 6472K 2888K 0K CPU1 1 0:00 0.24% top -CawPopid
> 2137 root 1 100 0 7256K 4436K 0K CPU3 3 0:25 96.79% openssl speed
> 2124 root 1 102 0 7256K 4436K 0K RUN 1 0:37 101.66% openssl speed
> 2123 root 1 101 0 7256K 4436K 0K CPU0 0 0:44 95.07% openssl speed
> 2118 root 1 29 0 7256K 4432K 0K RUN 1 0:00 0.00% openssl speed
> . . .
>
> Thus the difference is that the one instance
> of openssl speed does not run at all.
I tried to fg and ^C each of the 4 openssl's,
with cpu2's instance being last.
All worked but for cpu2's.
^C and ^Z do not work after the fg. But ^T reports:
# fg
cpuset -c -l 3 openssl speed >/dev/null 2>&1
^C
# fg
cpuset -c -l 1 openssl speed >/dev/null 2>&1
^C
# fg
cpuset -c -l 0 openssl speed >/dev/null 2>&1
^C# fg
cpuset -c -l 2 openssl speed >/dev/null 2>&1
^C
^Z
load: 3.03 cmd: openssl 2118 [runnable] 768.91r 0.06u 0.02s 0% 4424k
load: 3.07 cmd: openssl 2118 [runnable] 825.03r 0.06u 0.02s 0% 4424k
load: 3.06 cmd: openssl 2118 [runnable] 1177.78r 0.06u 0.02s 0% 4424k
Using an ssh sessions as the console is stuck
in the above state, it appears that the load
average is now stuck at around 3 despite the
basically idle state:
last pid: 2172; load averages: 3.03, 3.29, 3.08 up 0+10:39:36 12:22:23
23 processes: 2 running, 21 sleeping
CPU 0: 0.0% user, 0.0% nice, 0.0% system, 3.1% interrupt, 96.9% idle
CPU 1: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 2: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU 3: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 4480K Active, 590M Inact, 5364K Laundry, 145M Wired, 86M Buf, 46M Free
Swap: 1536M Total, 1536M Free
PID USERNAME THR PRI NICE SIZE RES SWAP STATE C TIME CPU COMMAND
2172 root 1 20 0 6344K 2888K 0K CPU3 3 0:00 0.24% top -CawPopid
2155 root 1 22 0 5484K 2660K 0K wait 3 0:00 0.00% su (sh)
2154 markmi 1 23 0 5260K 2648K 0K wait 0 0:00 0.00% su
2148 markmi 1 28 0 5524K 2724K 0K wait 3 0:00 0.00% -sh (sh)
2147 markmi 1 20 0 11024K 6208K 0K select 0 0:00 0.03% sshd: markmi at pts/0 (sshd)
2143 root 1 26 0 10644K 6128K 0K select 3 0:00 0.00% sshd: markmi [priv] (sshd)
2118 root 1 20 0 7256K 4432K 0K RUN 1 0:00 0.00% openssl speed
714 root 1 20 0 5568K 2828K 0K wait 3 0:00 0.00% -sh (sh)
713 root 1 22 0 5300K 2600K 0K wait 1 0:00 0.00% login [pam] (login)
712 root 1 52 0 4356K 2004K 0K ttyin 0 0:00 0.00% /usr/libexec/getty Pc ttyv0
665 root 1 52 0 4816K 2332K 0K nanslp 3 0:01 0.00% /usr/sbin/cron -s
661 root 1 20 0 9984K 4832K 0K select 0 0:00 0.00% /usr/sbin/sshd
617 root 1 20 0 4308K 1860K 0K select 0 0:06 0.01% /usr/sbin/powerd
614 root 1 20 0 10452K 10480K 0K select 3 0:04 0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
576 root 32 52 0 4388K 2160K 0K rpcsvc 0 0:00 0.00% nfsd: server (nfsd)
575 root 1 52 0 4720K 2044K 0K select 3 0:00 0.00% nfsd: master (nfsd)
573 root 1 52 0 5052K 2216K 0K select 3 0:00 0.00% /usr/sbin/mountd -r
443 root 1 20 0 4724K 2088K 0K select 0 0:00 0.00% /usr/sbin/rpcbind
434 root 1 20 0 4772K 2280K 0K select 1 0:01 0.00% /usr/sbin/syslogd -s
363 root 1 20 0 6428K 3576K 0K select 0 0:00 0.00% /sbin/devd
362 _dhcp 1 42 0 5128K 2388K 0K select 3 0:00 0.00% dhclient: ue0 (dhclient)
316 root 1 52 0 5032K 2292K 0K select 0 0:00 0.00% dhclient: ue0 [priv] (dhclient)
313 root 1 52 0 4800K 2148K 0K select 3 0:00 0.00% dhclient: system.syslog (dhclient)
Any more experiments before a reboot (and possibly not seeing
the problem at all)?
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-hackers
mailing list