rpi2 head -r327485 (e.g.): rpi2 leaves one "CPU n" always idle for some boots

Mark Millard markmi at dsl-only.net
Tue Jan 2 20:27:55 UTC 2018


[It looks like the opensll speed on problem cpuN (2)
can not get ^C and the like.

On 2018-Jan-2, at 12:13 PM, Mark Millard <markmi at dsl-only.net> wrote:

> On 2018-Jan-2, at 12:01 PM, Allan Jude <allanjude at freebsd.org> wrote:
> 
>> On 2018-01-02 14:48, Mark Millard wrote:
>>> I've seen this over many versions of head for months
>>> but have never managed to find a way to force it to
>>> happen. It just shows up once and a while.
>>> 
>>> Thus, I'm just dumping out some top and kernel information
>>> here for reference. I've used:
>>> 
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>> openssl speed 1>/dev/null 2>&1 &
>>> 
>>> to give the rpi2 4 active processes. Various outputs
>>> are from different times without a reboot between.
>>> 
>>> top -CaePores shows the likes of:
>>> 
>>> PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME     CPU COMMAND
>>> 614 root             1  20    0 10452K 10480K     0K select  1   0:00   0.03% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
>>> 661 root             1  52    0  9984K  6132K     0K select  1   0:00   0.00% /usr/sbin/sshd
>>> 751 root             1 101    0  7256K  4276K     0K RUN     1   0:28  99.57% openssl speed
>>> 750 root             1 100    0  7256K  4276K     0K CPU0    0   0:32  94.83% openssl speed
>>> 753 root             1  86    0  7256K  4276K     0K RUN     3   0:13  52.36% openssl speed
>>> 752 root             1  86    0  7256K  4276K     0K CPU3    3   0:14  46.54% openssl speed
>>> 363 root             1  20    0  6428K  3840K     0K select  3   0:00   0.00% /sbin/devd
>>> . . .
>>> 
>>> and:
>>> 
>>> last pid:   754;  load averages:  3.70,  2.38,  1.58                                                                                                                            up 0+00:16:50  01:59:37
>>> 21 processes:  5 running, 16 sleeping
>>> CPU 0: 94.9% user,  0.0% nice,  0.0% system,  5.1% interrupt,  0.0% idle
>>> CPU 1: 99.6% user,  0.0% nice,  0.0% system,  0.4% interrupt,  0.0% idle
>>> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>>> CPU 3:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
>>> Mem: 12M Active, 1136K Inact, 56M Wired, 30M Buf, 722M Free
>>> Swap: 1536M Total, 6M Free
>>> 
>>> From problem boot to problem boot, the CPU that stays
>>> idle has varied but usually has been CPU 2. I've never
>>> seen 2 or more stuck in idle.
>>> 
>>> show allpcpu shows the likes of:
>>> 
>>> db> show allpcpu
>>> Current CPU: 0
>>> 
>>> cpuid        = 0
>>> dynamic pcpu = 0x3d2540
>>> curthread    = 0xd8478ae0: pid 2032 tid 100150 "openssl"
>>> curpcb       = 0xd852ae98
>>> fpcurthread  = 0xd8478ae0: pid 2032 "openssl"
>>> idlethread   = 0xc376fae0: tid 100002 "idle: cpu0"
>>> curpmap      = 0xd8e43bf4
>>> curvnet      = 0
>>> 
>>> cpuid        = 1
>>> dynamic pcpu = 0x3998540
>>> curthread    = 0xd7e5b3a0: pid 2031 tid 100173 "openssl"
>>> curpcb       = 0xda7e0e98
>>> fpcurthread  = 0xd7e5b3a0: pid 2031 "openssl"
>>> idlethread   = 0xc376f740: tid 100003 "idle: cpu1"
>>> curpmap      = 0xd8e43ec4
>>> curvnet      = 0
>>> 
>>> cpuid        = 2
>>> dynamic pcpu = 0x3999540
>>> curthread    = 0xc376f3a0: pid 10 tid 100004 "idle: cpu2"
>>> curpcb       = 0xc378ae98
>>> fpcurthread  = none
>>> idlethread   = 0xc376f3a0: tid 100004 "idle: cpu2"
>>> curpmap      = 0
>>> curvnet      = 0
>>> 
>>> cpuid        = 3
>>> dynamic pcpu = 0x399a540
>>> curthread    = 0xd8477000: pid 2034 tid 100167 "openssl"
>>> curpcb       = 0xd876de98
>>> fpcurthread  = 0xd8477000: pid 2034 "openssl"
>>> idlethread   = 0xc376f000: tid 100005 "idle: cpu3"
>>> curpmap      = 0xc377ab04
>>> curvnet      = 0
>>> 
>>> In other words: it appears that the cpuN (here cpu2) is
>>> left with idle scheduled all the time for some reason.
>>> 
>>> ps from db> shows things like:
>>> 
>>> 
>>> db> ps
>>> pid  ppid  pgrp   uid  state   wmesg   wchan       cmd
>>> 2034   714  2034     0  R+                          openssl
>>> 2033   714  2033     0  R+      CPU 3               openssl
>>> 2032   714  2032     0  R+      CPU 0               openssl
>>> 2031   714  2031     0  R+      CPU 1               openssl
>>> 
>>> (then later:)
>>> 
>>> db> ps
>>> pid  ppid  pgrp   uid  state   wmesg   wchan       cmd
>>> 2034   714  2034     0  R+      CPU 3               openssl
>>> 2033   714  2033     0  R+                          openssl
>>> 2032   714  2032     0  R+      CPU 0               openssl
>>> 2031   714  2031     0  R+      CPU 1               openssl
>>> 
>>> There is also:
>>> 
>>>  10     0     0     0  RL      (threaded)          [idle]
>>> 100002                   CanRun                      [idle: cpu0]
>>> 100003                   CanRun                      [idle: cpu1]
>>> 100004                   CanRun                      [idle: cpu2]
>>> 100005                   CanRun                      [idle: cpu3]
>>> 
>>> 
>>> These are from:
>>> 
>>> # uname -apKU
>>> FreeBSD rpi2 12.0-CURRENT FreeBSD 12.0-CURRENT  r327485M  arm armv7 1200054 1200054
>>> 
>>> 
>>> ===
>>> Mark Millard
>>> markmi at dsl-only.net
>>> 
>>> _______________________________________________
>>> freebsd-hackers at freebsd.org mailing list
>>> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>>> 
>> 
>> Have you tried using 'cpuset' to force each of the 4 to run on a
>> specific CPU? What happens if you force it to run on cpu #2?
> 
> # cpuset -c -l 2 openssl speed 1>/dev/null 2>&1 &
> 
> results in:
> 
> last pid:  2120;  load averages:  2.95,  2.44,  2.25                                                                                                                            up 0+10:26:07  12:08:54
> 18 processes:  2 running, 16 sleeping
> CPU 0:  0.0% user,  0.0% nice,  0.0% system,  4.7% interrupt, 95.3% idle
> CPU 1:  0.4% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.6% idle
> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 1852K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 53M Free
> Swap: 1536M Total, 1536M Free
> 
>  PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME     CPU COMMAND
> 2120 root             1  20    0  6432K  2872K     0K CPU1    1   0:00   0.22% top -CawPopid
> 2118 root             1  29    0  7256K  4432K     0K RUN     1   0:00   0.00% openssl speed
> . . .
> 
> 
> With the other 3 also assigned:
> 
> last pid:  2138;  load averages:  4.45,  3.04,  2.51                                                                                                                            up 0+10:28:48  12:11:35
> 21 processes:  5 running, 16 sleeping
> CPU 0: 93.7% user,  0.0% nice,  1.6% system,  4.7% interrupt,  0.0% idle
> CPU 1:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
> CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> CPU 3:  100% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0% idle
> Mem: 3632K Active, 587M Inact, 5368K Laundry, 145M Wired, 86M Buf, 51M Free
> Swap: 1536M Total, 1536M Free
> 
>  PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME     CPU COMMAND
> 2138 root             1  20    0  6472K  2888K     0K CPU1    1   0:00   0.24% top -CawPopid
> 2137 root             1 100    0  7256K  4436K     0K CPU3    3   0:25  96.79% openssl speed
> 2124 root             1 102    0  7256K  4436K     0K RUN     1   0:37 101.66% openssl speed
> 2123 root             1 101    0  7256K  4436K     0K CPU0    0   0:44  95.07% openssl speed
> 2118 root             1  29    0  7256K  4432K     0K RUN     1   0:00   0.00% openssl speed
> . . .
> 
> Thus the difference is that the one instance
> of openssl speed does not run at all.

I tried to fg and ^C each of the 4 openssl's,
with cpu2's instance being last.

All worked but for cpu2's.

^C and ^Z do not work after the fg. But ^T reports:

# fg
cpuset -c -l 3 openssl speed >/dev/null 2>&1
^C
# fg
cpuset -c -l 1 openssl speed >/dev/null 2>&1
^C
# fg
cpuset -c -l 0 openssl speed >/dev/null 2>&1
^C# fg
cpuset -c -l 2 openssl speed >/dev/null 2>&1
^C
^Z
load: 3.03  cmd: openssl 2118 [runnable] 768.91r 0.06u 0.02s 0% 4424k
load: 3.07  cmd: openssl 2118 [runnable] 825.03r 0.06u 0.02s 0% 4424k
load: 3.06  cmd: openssl 2118 [runnable] 1177.78r 0.06u 0.02s 0% 4424k


Using an ssh sessions as the console is stuck
in the above state, it appears that the load
average is now stuck at around 3 despite the
basically idle state:

last pid:  2172;  load averages:  3.03,  3.29,  3.08                                                                                                                            up 0+10:39:36  12:22:23
23 processes:  2 running, 21 sleeping
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  3.1% interrupt, 96.9% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 2:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 3:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 4480K Active, 590M Inact, 5364K Laundry, 145M Wired, 86M Buf, 46M Free
Swap: 1536M Total, 1536M Free

  PID USERNAME       THR PRI NICE   SIZE    RES   SWAP STATE   C   TIME     CPU COMMAND
 2172 root             1  20    0  6344K  2888K     0K CPU3    3   0:00   0.24% top -CawPopid
 2155 root             1  22    0  5484K  2660K     0K wait    3   0:00   0.00% su (sh)
 2154 markmi           1  23    0  5260K  2648K     0K wait    0   0:00   0.00% su
 2148 markmi           1  28    0  5524K  2724K     0K wait    3   0:00   0.00% -sh (sh)
 2147 markmi           1  20    0 11024K  6208K     0K select  0   0:00   0.03% sshd: markmi at pts/0 (sshd)
 2143 root             1  26    0 10644K  6128K     0K select  3   0:00   0.00% sshd: markmi [priv] (sshd)
 2118 root             1  20    0  7256K  4432K     0K RUN     1   0:00   0.00% openssl speed
  714 root             1  20    0  5568K  2828K     0K wait    3   0:00   0.00% -sh (sh)
  713 root             1  22    0  5300K  2600K     0K wait    1   0:00   0.00% login [pam] (login)
  712 root             1  52    0  4356K  2004K     0K ttyin   0   0:00   0.00% /usr/libexec/getty Pc ttyv0
  665 root             1  52    0  4816K  2332K     0K nanslp  3   0:01   0.00% /usr/sbin/cron -s
  661 root             1  20    0  9984K  4832K     0K select  0   0:00   0.00% /usr/sbin/sshd
  617 root             1  20    0  4308K  1860K     0K select  0   0:06   0.01% /usr/sbin/powerd
  614 root             1  20    0 10452K 10480K     0K select  3   0:04   0.01% /usr/sbin/ntpd -g -c /etc/ntp.conf -p /var/run/ntpd.pid -f /var/db/ntpd.drift
  576 root            32  52    0  4388K  2160K     0K rpcsvc  0   0:00   0.00% nfsd: server (nfsd)
  575 root             1  52    0  4720K  2044K     0K select  3   0:00   0.00% nfsd: master (nfsd)
  573 root             1  52    0  5052K  2216K     0K select  3   0:00   0.00% /usr/sbin/mountd -r
  443 root             1  20    0  4724K  2088K     0K select  0   0:00   0.00% /usr/sbin/rpcbind
  434 root             1  20    0  4772K  2280K     0K select  1   0:01   0.00% /usr/sbin/syslogd -s
  363 root             1  20    0  6428K  3576K     0K select  0   0:00   0.00% /sbin/devd
  362 _dhcp            1  42    0  5128K  2388K     0K select  3   0:00   0.00% dhclient: ue0 (dhclient)
  316 root             1  52    0  5032K  2292K     0K select  0   0:00   0.00% dhclient: ue0 [priv] (dhclient)
  313 root             1  52    0  4800K  2148K     0K select  3   0:00   0.00% dhclient: system.syslog (dhclient)


Any more experiments before a reboot (and possibly not seeing
the problem at all)?

===
Mark Millard
markmi at dsl-only.net




More information about the freebsd-hackers mailing list