high load system do not take all CPU time
Коньков Евгений
kes-kes at yandex.ru
Mon Dec 26 19:44:31 UTC 2011
Здравствуйте, Коньков.
Вы писали 26 декабря 2011 г., 20:52:11:
КЕ> Здравствуйте, Коньков.
КЕ> Вы писали 25 декабря 2011 г., 18:10:17:
КЕ>> Здравствуйте, wishmaster.
КЕ>> Вы писали 19 декабря 2011 г., 6:54:08:
w>>> --- Original message ---
w>>> From: "Коньков Евгений" <kes-kes at yandex.ru>
w>>> To: "Daniel Staal" <DStaal at usa.net>
w>>> Date: 18 December 2011, 19:47:40
w>>> Subject: Re[2]: high load system do not take all CPU time
w>>>
w>>>
>>>> Здравствуйте, Daniel.
>>>>
>>>> Вы писали 18 декабря 2011 г., 17:52:00:
>>>>
>>>> DS> --As of December 17, 2011 10:29:42 AM +0200, Коньков Евгений
>>>> DS> is alleged to have said:
>>>>
>>>> >> How to debug why system do not use free CPU resouces?
>>>> >>
>>>> >> On this pictures you can see that CPU can not exceed 400tics
>>>> >> http://piccy.info/view3/2368839/c9022754d5fcd64aff04482dd360b5b2/
>>>> >> http://piccy.info/view3/2368837/a12aeed98681ed10f1a22f5b5edc5abc/
>>>> >> http://piccy.info/view3/2368836/da6a67703af80eb0ab8088ab8421385c/
>>>> >>
>>>> >>
>>>> >> On these pictures you can see that problems begin with trafic on re0
>>>> >> when CPU load rise to "maximum"
>>>> >> http://piccy.info/view3/2368834/512139edc56eea736881affcda490eca/
>>>> >> http://piccy.info/view3/2368827/d27aead22eff69fd1ec2b6aa15e2cea3/
>>>> >>
>>>> >> But there is 25% CPU idle yet at that moment.
>>>>
>>>> DS> <snip>
>>>>
>>>> >># top -SIHP
>>>> >> last pid: 93050; load averages: 1.45, 1.41, 1.29
>>>> >> up 9+16:32:06 10:28:43 237 processes: 5 running, 210 sleeping, 2
>>>> >> stopped, 20 waiting
>>>> >> CPU 0: 0.8% user, 0.0% nice, 8.7% system, 17.7% interrupt, 72.8% idle
>>>> >> CPU 1: 0.0% user, 0.0% nice, 9.1% system, 20.1% interrupt, 70.9% idle
>>>> >> CPU 2: 0.4% user, 0.0% nice, 9.4% system, 19.7% interrupt, 70.5% idle
>>>> >> CPU 3: 1.2% user, 0.0% nice, 6.3% system, 22.4% interrupt, 70.1% idle
>>>> >> Mem: 843M Active, 2476M Inact, 347M Wired, 150M Cache, 112M Buf, 80M Free
>>>> >> Swap: 4096M Total, 15M Used, 4080M Free
>>>>
>>>> DS> --As for the rest, it is mine.
>>>>
>>>> DS> You are I/O bound; most of your time is spent in interrupts. The CPU is
>>>> DS> dealing with things as fast as it can get them, but it has to wait for the
>>>> DS> disk and/or network card to get them to it. The CPU is not your problem;
>>>> DS> if you need more performance, you need to tune the I/O. (And possibly get
>>>> DS> better I/O cards, if available.)
>>>>
>>>> DS> Daniel T. Staal
>>>>
>>>> can I get interrupt limit or calculate it before that limit is
>>>> reached?
>>>>
>>>> interrupt source is internal card:
>>>> # vmstat -i
>>>> interrupt total rate
>>>> irq14: ata0 349756 78
>>>> irq16: ehci0 7427 1
>>>> irq23: ehci1 12150 2
>>>> cpu0:timer 18268704 4122
>>>> irq256: re0 85001260 19178
>>>> cpu1:timer 18262192 4120
>>>> cpu2:timer 18217064 4110
>>>> cpu3:timer 18210509 4108
>>>> Total 158329062 35724
>>>>
>>>> Have you any good I/O tuning links to read?
>>>>
>>>> --
>>>> С уважением,
>>>> Коньков mailto:kes-kes at yandex.ru
w>>>
w>>> Your problem is in the poor performance LAN Card. Guy from
w>>> Calomel Org told you about it. He advised you to change to Intel Network Card.
КЕ>> see at time 17:20
КЕ>> http://piccy.info/view3/2404329/dd9f28f8ac74d3d2f698ff14c305fe31/
КЕ>> at this point freeradius start to work slow because of no CPU time is
КЕ>> allocated to it or is allocated to little and mpd5 start to drop users because of no response
КЕ>> from radius. I do not know what idle were on 'top', sadly.
КЕ>> does SNMP return right values for CPU usage?
КЕ> last pid: 14445; load averages: 6.88, 5.69, 5.33 up 0+12:11:35 20:37:57
КЕ> 244 processes: 12 running, 211 sleeping, 3 stopped, 15 waiting, 3 lock
КЕ> CPU 0: 4.7% user, 0.0% nice, 13.3% system, 46.7% interrupt, 35.3% idle
КЕ> CPU 1: 2.0% user, 0.0% nice, 9.8% system, 69.4% interrupt, 18.8% idle
КЕ> CPU 2: 2.7% user, 0.0% nice, 8.2% system, 74.5% interrupt, 14.5% idle
КЕ> CPU 3: 1.2% user, 0.0% nice, 9.4% system, 78.0% interrupt, 11.4% idle
КЕ> Mem: 800M Active, 2708M Inact, 237M Wired, 60M Cache, 112M Buf, 93M Free
КЕ> Swap: 4096M Total, 25M Used, 4071M Free
КЕ> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
КЕ> 12 root -72 - 0K 160K CPU1 1 159:49 100.00% {swi1: netisr 3}
КЕ> 12 root -72 - 0K 160K *per-i 2 101:25 84.57% {swi1: netisr 1}
КЕ> 12 root -72 - 0K 160K *per-i 3 60:10 40.72% {swi1: netisr 2}
КЕ> 12 root -72 - 0K 160K *per-i 2 41:54 39.26% {swi1: netisr 0}
КЕ> 11 root 155 ki31 0K 32K RUN 0 533:06 24.46% {idle: cpu0}
КЕ> 3639 root 36 0 10460K 3824K CPU3 3 7:43 22.17% zebra
КЕ> 12 root -92 - 0K 160K CPU0 0 93:56 14.94% {irq256: re0}
КЕ> 11 root 155 ki31 0K 32K RUN 1 563:29 14.16% {idle: cpu1}
КЕ> 11 root 155 ki31 0K 32K RUN 2 551:46 12.79% {idle: cpu2}
КЕ> 11 root 155 ki31 0K 32K RUN 3 558:54 11.52% {idle: cpu3}
КЕ> 13 root -16 - 0K 32K sleep 3 16:56 4.93% {ng_queue2}
КЕ> 13 root -16 - 0K 32K RUN 2 16:56 4.69% {ng_queue0}
КЕ> 13 root -16 - 0K 32K RUN 0 16:56 4.54% {ng_queue1}
КЕ> 13 root -16 - 0K 32K RUN 1 16:59 4.44% {ng_queue3}
КЕ> 6818 root 22 0 15392K 4836K select 2 25:16 4.10% snmpd
КЕ> 49448 freeradius 29 0 27748K 16984K select 3 2:37 2.59% {initial thread}
КЕ> 16118 firebird 20 -10 233M 145M usem 2 0:06 0.83% {fb_smp_server}
КЕ> 14282 cacti 21 0 12000K 3084K select 3 0:00 0.68% snmpwalk
КЕ> 16118 firebird 20 -10 233M 145M usem 0 0:03 0.54% {fb_smp_server}
КЕ> 5572 root 21 0 136M 78284K wait 1 5:23 0.49% {mpd5}
КЕ> 14507 root 20 0 9536K 1148K nanslp 0 0:51 0.15% monitord
КЕ> 14441 root 25 0 11596K 4048K CPU0 0 0:00 0.00% perl5.14.1
КЕ> 14443 cacti 21 0 11476K 2920K piperd 0 0:00 0.00% perl5.14.1
КЕ> 14444 root 22 0 9728K 1744K select 0 0:00 0.00% sudo
КЕ> 14445 root 21 0 9672K 1240K kqread 0 0:00 0.00% ping
КЕ> # vmstat -i
КЕ> interrupt total rate
КЕ> irq14: ata0 1577446 35
КЕ> irq16: ehci0 66968 1
КЕ> irq23: ehci1 94012 2
КЕ> cpu0:timer 180767557 4122
КЕ> irq256: re0 683483519 15587
КЕ> cpu1:timer 180031511 4105
КЕ> cpu3:timer 175311179 3998
КЕ> cpu2:timer 179460055 4092
КЕ> Total 1400792247 31947
КЕ> 1 users Load 6.02 5.59 5.31 Dec 26 20:38
КЕ> Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
КЕ> Tot Share Tot Share Free in out in out
КЕ> Act 1022276 12900 3562636 39576 208992 count 4
КЕ> All 1143548 20380 5806292 100876 pages 48
КЕ> Proc: Interrupts
КЕ> r p d s w Csw Trp Sys Int Sof Flt 1135 cow 37428 total
КЕ> 186 129k 10k 17k 21k 14k 5857 2348 zfod 15 ata0 14
КЕ> 184 ozfod 1 ehci0 16
КЕ> 8.1%Sys 68.4%Intr 5.9%User 0.0%Nice 17.6%Idle 7%ozfod 2 ehci1 23
КЕ> | | | | | | | | | | | daefr 4120 cpu0:timer
КЕ> ====++++++++++++++++++++++++++++++++++>>> 2423 prcfr 21013 re0 256
КЕ> 208 dtbuf 4425 totfr 4100 cpu1:timer
КЕ> Namei Name-cache Dir-cache 142271 desvn react 4083 cpu3:timer
КЕ> Calls hits % hits % 3750 numvn pdwak 4094 cpu2:timer
КЕ> 36571 36546 100 1998 frevn pdpgs
КЕ> intrn
КЕ> Disks ad0 da0 pass0 241412 wire
КЕ> KB/t 26.81 0.00 0.00 826884 act
КЕ> tps 15 0 0 2714240 inact
КЕ> MB/s 0.39 0.00 0.00 97284 cache
КЕ> %busy 1 0 0 111708 free
КЕ> 114976 buf
КЕ> # netstat -w 1 -I re0
КЕ> input (re0) output
КЕ> packets errs idrops bytes packets errs bytes colls
КЕ> 52329 0 0 40219676 58513 0 40189497 0
КЕ> 50207 0 0 37985881 57340 0 38438634 0
КЕ> http://piccy.info/view3/2409691/69d31186d8943a53c31ec193c8dfe79d/
КЕ> http://piccy.info/view3/2409746/efb444ffe892592fbd6f025fd14535c4/
КЕ> before overload happen, as you can see, server passthrought more traffic.
КЕ> programs at this moment works very sloooow!
КЕ> at the day on re0 there are can be more interrupts than now and server works fine
КЕ> some problems with scheduler I think.
and three is *radix state.
last pid: 51533; load averages: 4.67, 5.24, 5.29 up 0+12:59:43 21:26:05
284 processes: 6 running, 255 sleeping, 3 stopped, 17 waiting, 3 lock
CPU 0: 0.5% user, 0.0% nice, 15.2% system, 27.2% interrupt, 57.1% idle
CPU 1: 0.0% user, 0.0% nice, 20.1% system, 22.3% interrupt, 57.6% idle
CPU 2: 1.6% user, 0.0% nice, 29.3% system, 20.7% interrupt, 48.4% idle
CPU 3: 2.7% user, 0.0% nice, 21.7% system, 16.3% interrupt, 59.2% idle
Mem: 788M Active, 2660M Inact, 239M Wired, 81M Cache, 112M Buf, 129M Free
Swap: 4096M Total, 51M Used, 4045M Free, 1% Inuse
PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
51239 root -72 0 10460K 3416K CPU0 0 0:15 66.80% zebra
11 root 155 ki31 0K 32K CPU3 3 565:03 46.53% {idle: cpu3}
11 root 155 ki31 0K 32K RUN 1 571:46 45.70% {idle: cpu1}
11 root 155 ki31 0K 32K RUN 2 558:13 44.73% {idle: cpu2}
11 root 155 ki31 0K 32K CPU0 0 546:21 43.85% {idle: cpu0}
12 root -72 - 0K 160K *radix 1 204:13 42.14% {swi1: netisr 3}
12 root -72 - 0K 160K *radix 2 141:57 37.55% {swi1: netisr 1}
12 root -72 - 0K 160K *radix 3 61:10 25.15% {swi1: netisr 0}
12 root -72 - 0K 160K WAIT 3 78:28 19.92% {swi1: netisr 2}
12 root -92 - 0K 160K WAIT 0 100:28 9.13% {irq256: re0}
6818 root 22 0 15392K 4836K select 1 26:59 2.10% snmpd
13 root -16 - 0K 32K sleep 3 19:24 1.56% {ng_queue1}
51531 cacti 36 0 17092K 5944K select 0 0:00 1.51% {initial thread}
13 root -16 - 0K 32K sleep 3 19:27 1.46% {ng_queue3}
13 root -16 - 0K 32K sleep 3 19:24 1.46% {ng_queue2}
13 root -16 - 0K 32K sleep 1 19:25 1.42% {ng_queue0}
51531 cacti 52 0 17092K 5944K usem 0 0:00 1.42% {perl5.14.1}
51510 cacti 46 0 32256K 16304K piperd 3 0:00 1.22% php
51514 cacti 46 0 11476K 2940K piperd 2 0:00 1.22% perl5.14.1
51515 root 46 0 9728K 1748K select 3 0:00 1.22% sudo
51516 root 45 0 9672K 1220K kqread 1 0:00 1.22% ping
51508 cacti 52 0 32256K 16312K piperd 2 0:00 1.03% php
51248 root 4 0 10564K 4980K select 0 0:00 0.44% bgpd
5572 root 20 -15 136M 64812K select 1 6:10 0.34% {mpd5}
51502 cacti 25 0 32256K 16568K nanslp 0 0:00 0.34% php
51513 cacti 23 0 17772K 4436K piperd 1 0:00 0.34% rrdtool
5572 root 20 -15 136M 64812K select 2 0:00 0.34% {mpd5}
5572 root 20 -15 136M 64812K select 1 0:00 0.34% {mpd5}
5572 root 20 -15 136M 64812K select 1 0:00 0.34% {mpd5}
I am trying to google about *radix and *per-i but I did not find
anything (
--
С уважением,
Коньков mailto:kes-kes at yandex.ru
More information about the freebsd-questions
mailing list