powerd to use sysctl to import temps to drop freq to avoid heat
crash
Ian Smith
smithi at nimnet.asn.au
Thu Jan 5 07:03:33 UTC 2012
On Tue, 3 Jan 2012, Julian H. Stacey wrote:
[snip ccs, feel free .. and cutting lots .. likely my last post on this]
> > As ume@ points out, passive cooling _should_ be handling this, and in
> > any case - even if you get a forced shutdown at CRT temp. - it shouldn't
> > be 'crashing'. Please elaborate on 'crashing'? and at what sort of
>
> TCP failed, rdist & remote xterms & NFS fail, local mousepad fails. I
> didnt bother to try for a crash dump. I guessed damage was about to occur.
Hmm. 'vmstat -i' during this state might be interesting, if console
access still works, but heat related failures can look pretty random.
> > temperature this occurs?
>
> Low 80s I recall. Not sure, I thought that was so high damage would
Nah, 80 isn't so hot. Even my single-core P2-M hits 85C when CPU bound,
and as you've shown, one Toshiba had 100C-plus _PSV and _CRT temps.
> occur, so I just turned off & put an external fan under. I've
Good, but it's winter there, eh? At 2am it was still 27C here :)
> Before with
> sysctl -a | grep temp
> I just got
> hw.acpi.thermal.tz0.temperature: 68.0C
> dev.acpi_hp.0.hdd_temperature: 4
> after
> kldload amdtemp
> I get a lot more temperatures:
Good idea.
> dev.acpi_hp.0.hdd_temperature: 4
> dev.amdtemp.0.%desc: AMD K8 Thermal Sensors
> dev.amdtemp.0.%driver: amdtemp
> dev.amdtemp.0.%parent: hostb4
> dev.amdtemp.0.sensor0.core0: 51.0C
> dev.amdtemp.0.sensor0.core1: 49.0C
> dev.amdtemp.0.sensor1.core0: 58.0C
> dev.amdtemp.0.sensor1.core1: 56.0C
> dev.cpu.0.temperature: 57.0C
> dev.cpu.1.temperature: 57.0C
> hw.acpi.thermal.tz0.temperature: 66.0C
tz0.temperature is reading 9C above the core sensors; that seems a lot.
May tend to suggest problems with heatsinks / thermal paste etc, noting
others' comments about HP machines running hot .. "lap"tops they ain't,
but then apart from smaller notebooks, eeepcs etc, few are these days.
> > > dmesg & sysctl etc diagnostics at
> > > http://berklix.com/~jhs/hardware/hp/pavilion/dm3-1155ea/
[..]
> > Almost too much info :) esp. with sysctl -a including the verbose dmesg,
> > but I noticed a couple of things on a quick skim.
>
> Sorry I haven't got to grips which of these should be on & off so
> now turned on all of
> debug.bootverbose: 1
> debug.cpufreq.verbose: 1
> debug.hwpstate_verbose: 1
> dev.acpi_hp.0.verbose: 0 # set to "1" in loader.conf but ignored.
> hw.acpi.verbose: 1
acpi_hp itself is loaded, so that one probably needs setting in
/etc/sysctl.conf; don't know it's likely to be more informative.
> > If you run powerd -v,
> > what sort of freqs does it usually run at, when more or less idle?
>
> When 75 to 90% idle (from top)
>
> sysctl -a | grep freq | grep -v "cpufreq"
> kern.acct_chkfreq: 15
> kern.timecounter.tc.i8254.frequency: 1193182
> kern.timecounter.tc.ACPI-safe.frequency: 3579545
> kern.timecounter.tc.HPET.frequency: 14318180
> kern.timecounter.tc.TSC.frequency: 1595998039
> net.inet.sctp.sack_freq: 2
> machdep.acpi_timer_freq: 3579545
> machdep.tsc_freq: 1595998039
> machdep.i8254_freq: 1193182
All irrelevant in this context. Timecounter used is hpet, quality 900,
with acpi-safe coming in second best at 850 (dmesg).
> dev.cpu.0.freq: 497
> dev.cpu.0.freq_levels: 1592/100000 1393/87500 1194/75000 995/62500 796/35457 696/31024 597/26592 497/22160 398/17728 298/13296 199/8864 99/4432
> dev.acpi_throttle.0.freq_settings: 10000/-1 8750/-1 7500/-1 6250/-1 5000/-1 3750/-1 2500/-1 1250/-1
> dev.powernow.0.freq_settings: 1592/100000 796/35457
> dev.powernow.1.freq_settings: 1592/100000 796/35457
Logging dev.cpu.0.freq vs hw.acpi.thermal.tz0.temperature is mostly all
that's needed. With timestamps, and fan speed where available. Most of
what I want to know routinely here is shown by:
#!/bin/sh
echo -n "`date` "
sysctl dev.cpu.0.freq dev.cpu.0.cx_usage
sysctl dev.acpi_ibm | egrep 'fan_|thermal'
sysctl hw.acpi.thermal.tz0.temperature
acpiconf -i0 | egrep 'State|Remain|Present|Volt'
t23# t23stat # at 29C ambient
Thu Jan 5 17:10:32 EST 2012 dev.cpu.0.freq: 733
dev.cpu.0.cx_usage: 0.01% 99.98% 0.00% last 766us
dev.acpi_ibm.0.fan_speed: 2391
dev.acpi_ibm.0.fan_level: 1
dev.acpi_ibm.0.thermal: 49 49 46 -1 -1 -1 32 -1
hw.acpi.thermal.tz0.temperature: 49.0C
State: high
Remaining capacity: 100%
Remaining time: unknown
Present rate: 0 mW
Present voltage: 12381 mV
> On an idle machine,
> (with a little patch from Gary J. to allow for multiple cores:
> /src/bsd/fixes/FreeBSD/src/gen/usr.sbin/powerd.c.REL=8.2-RELEASE.diff
Local path I guess? If the patch is trying to set cpu.1.freq it might
explain eg 'powerd: error setting CPU frequency 398: Invalid argument'
below? when clearly the requested freqs are being set .. we just set
cpu.0.freq for all cores, there's (so far) no ability to run different
cores or packages at different freqs, that I've heard of. mav@, avg@ or
someone@ will correct me if I'm (yet again :) behind the times!
> running powerd -v # Using defaults
> powerd: using sysctl for AC line status
> powerd: using devd for AC line status
> load 64%, current freq 199 MHz (10), wanted freq 339 MHz
> changing clock speed from 199 MHz to 398 MHz
> powerd: error setting CPU frequency 398: Invalid argument
> load 27%, current freq 398 MHz ( 8), wanted freq 339 MHz
> load 7%, current freq 398 MHz ( 8), wanted freq 328 MHz
> load 4%, current freq 398 MHz ( 8), wanted freq 317 MHz
> load 8%, current freq 398 MHz ( 8), wanted freq 307 MHz
> load 12%, current freq 398 MHz ( 8), wanted freq 297 MHz
> changing clock speed from 398 MHz to 298 MHz
> powerd: error setting CPU frequency 298: Invalid argument
> load 25%, current freq 298 MHz ( 9), wanted freq 297 MHz
> load 7%, current freq 298 MHz ( 9), wanted freq 287 MHz
> load 14%, current freq 298 MHz ( 9), wanted freq 278 MHz
> load 15%, current freq 298 MHz ( 9), wanted freq 269 MHz
> load 7%, current freq 298 MHz ( 9), wanted freq 260 MHz
> load 6%, current freq 298 MHz ( 9), wanted freq 251 MHz
> load 0%, current freq 298 MHz ( 9), wanted freq 243 MHz
> load 4%, current freq 298 MHz ( 9), wanted freq 235 MHz
> load 10%, current freq 298 MHz ( 9), wanted freq 227 MHz
> load 0%, current freq 298 MHz ( 9), wanted freq 219 MHz
> load 17%, current freq 298 MHz ( 9), wanted freq 212 MHz
> load 0%, current freq 298 MHz ( 9), wanted freq 205 MHz
> load 3%, current freq 298 MHz ( 9), wanted freq 198 MHz
> changing clock speed from 298 MHz to 199 MHz
> powerd: error setting CPU frequency 199: Invalid argument
> load 42%, current freq 199 MHz (10), wanted freq 221 MHz
> changing clock speed from 199 MHz to 298 MHz
> powerd: error setting CPU frequency 298: Invalid argument
> load 33%, current freq 298 MHz ( 9), wanted freq 221 MHz
[..]
Some relative timestamps on these would give a better idea over time,
but clearly it's using low freqs at low loads.
> > you using default powerd settings?
>
> I was when it crashed too often to be useful.
> accepting values from /etc/defaults/rc.conf
> but it crashed too much, so my rc.conf is now set to give:
> /usr/sbin/powerd -a adaptive -b minimum -n minimum
Should be ok. Seems powerd is behaving as expected; if it's overheating
at those low freqs (or not cooling off once idle) and it's just about
brand new re dust on heatsinks etc, may be a real issue under warranty?
> > When running on battery can you
> > monitor power use with acpiconf -i0 to see the actual effect on power
> > usage of running at various lower freqs?
>
> Wow what a nice command !
> Design capacity: 57276 mWh
> Last full capacity: 55544 mWh
> Technology: secondary (rechargeable)
> Design voltage: 11100 mV
> Capacity (warn): 5554 mWh
> Capacity (low): 0 mWh
> Low/warn granularity: 555 mWh
> Warn/full granularity: 555 mWh
> Model number: 5160
> Serial number: Li4402A
> Type: Li
> OEM info: Hewlett-Packard
> State: high
> Remaining capacity: 100%
> Remaining time: unknown
> Present rate: unknown
> Present voltage: 12375 mV
>
> Yes I could do that. I dont see anything that's going to give me
> instantaneous consumption readings above though, & battery state
'State: high' shows it's on AC, fully charged. While running ON BATTERY
you should see something more like (here):
State: discharging
Remaining capacity: 98%
Remaining time: 1:59
Present rate: 15716 mW
Present voltage: 12025 mV
At idle (here), ie 15.7W for the whole machine, nominally 12.5W for CPU
at low speed, 19.1W at 'high' speed. Measured from the wall, actually
from the house battery ammeter pre-inverter (12V solar), on AC this box
draws ~18W @733 (mostly C2 state) and ~36W @1133 (working, C1 state).
t23# sysctl dev.cpu | grep -v '\.%'
dev.cpu.0.freq: 733
dev.cpu.0.freq_levels: 1133/19100 733/12500
dev.cpu.0.cx_supported: C1/0 C2/84 C3/120
dev.cpu.0.cx_lowest: C2
dev.cpu.0.cx_usage: 0.00% 99.99% 0.00% last 732us
Your mW or mA rates will be far greater of course. To elaborate on what
Alexander said a bit, your 'real' nominal CPU power ratings are:
> dev.powernow.0.freq_settings: 1592/100000 796/35457
> dev.powernow.1.freq_settings: 1592/100000 796/35457
ie 100.000W @1592, and 35.457W @796MHz. I think that has to be per
package, I can't believe 200W for both cores, 100W is heaps anyway.
That of course is in addition to all other power use by the machine; in
my case only a few watts, but your GPU alone may be using a lot, and I
recall seeing some modern HPs use a common heatsink/pipes for CPU & GPU.
> prediction is generaly not accurate. However, I can do better, I
> can connect a power meter between wall supply & transformer
> (battery will be irrelevant if full, not charging) & see how much
> power is going in. & a temperature probe underneath. (OK, heat
> emitted by screen I don't care about, (just any heat in chassis,
> from inefficiency converting for screen), I'll see if I can turn
> off screen with keys or BIOS)
Yes an accurate power meter is the go, and GPUs can be a large part of
total heat source in modern notebooks. Luckily I have no such issue :)
I guess measuring with/without X running should show some difference?
What you're looking for is your 'baseload' power draw at idle, vs when
working at various rates. GLXgears may be the 'hottest' thing I know.
> But I guess another way is to hack the ACPI to turn the fan on at
> lower threshold. ? Maybe I'll have to tweak what I got from acpidump -dt
> http://berklix.com/~jhs/hardware/hp/pavilion/dm3-1155ea/jhs-hp-pPavilion-dm3-1155ea.asl
> per:
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/acpi-debug.html
No time to look now, but it's possible your AML doesn't provide access
to fan/s at all, hunt through _TZ stuff. Here only acpi_ibm provides
fan access and it's quite specific to IBM/Lenovo kit. acpi_hp.c may be
rewarding bedtime reading, but acpi_hp(4) makes no mention of fans and
such - but turning off (if unused) wifi, bluetooth & firewire will help.
> > acpi_throttle0: <ACPI CPU Throttling> on cpu0
> > acpi_throttle0: P_CNT from P_BLK 0x410
> > powernow0: <PowerNow! K8> on cpu0
> > powernow1: <PowerNow! K8> on cpu1
>
> powernow & throttling all new to me, I must web search.
> sysctl -d dev.powernow.0.freq_settings
> dev.powernow.0.freq_settings: CPU frequency driver settings
Start with cpufreq(4) for a good overview of all the various pieces, but
most drivers (eg powernow) mentioned have scant or no docs, except code.
(Warning: once you go down this rabbit hole, you may never come back! :)
> > I also notice only C1 states, but using machdep.idle: amdc1e so I wonder
> > if you're getting benefit from that? Are there BIOS settings re that?
>
> I didnt know what C1 was. Found it:
> http://en.wikipedia.org/wiki/Advanced_Configuration_and_Power_Interface
>
> sysctl -a | grep machdep.idle
> machdep.idle: amdc1e
> machdep.idle_available: spin, amdc1e, hlt, acpi,
> sysctl -d machdep.idle
> machdep.idle: currently selected idle function
> sysctl -d machdep.idle_available
> machdep.idle_available: list of available idle functions
>
> Where are you seeing the C1 ? Wkipedia does make C1E sound more desirable.
machdep.idle: amdc1e .. I know too little about how it works to comment,
except that it should be a lower power mode than basic C1, and I think
reading both mav's excellent article and the wikipedia one should help.
True masochists can revel in days of fun, reading the whole ACPI spec.
> I went throught the screen BIOS a while back & found no power
> / temp / frequency options, was a rather limited BIOS.
As http://wiki.freebsd.org/TuningPowerConsumption says, AMD tend to hide
most of the internal C-state management. Maybe worth trying 9.0 on it?
> > > /boot/loader.conf
> > > I just added
> > > acpi_hp_load="YES"
> > > (after reboot) does not produce /dev/hpcmi
Perhaps yours isn't one of acpi_hp's more-targetted models?
> > I don't see any mention of active cooling (ie, fan/s) in your sysctls,
> > including acpi_hp.
>
> Yes, been puzzling me that, There is one little fan slot, doesnt
> seem to blow hot right now, when under case too hot to hold.
> Yes I cant see revs/min
Warner suggested more about this, and Kevin's contribution is noteworthy
also. If the fan's working right, there should be HOT! air being pumped
out when it's hot. A home-made stethoscope should let you hear it spin.
>From everything to date, I wonder if it doesn't just have a broken fan?
> > Here I'm running a custom script to control CPU fan
> > via acpi_ibm (the auto fan didn't cut in till over 65C, then pumped it
> > down to ~45C), but it seems you may not have access to fan control?
>
> I dont know how to control the fan. I'd like to see your script please,
> or at least commands you'r running.
Sure, below .. but I doubt much of it's any use on your HP.
> kldload acpi_ibm ; kldstat
> kernel sound.ko snd_hda.ko acpi_hp.ko acpi_wmi.ko linux.ko
> linux_adobe.ko radeon.ko drm.ko acpi_ibm.ko
I'm quite surprised it even loaded. What says sysctl dev.acpi_ibm ?
> > You could try setting hw.acpi.thermal.user_override=1 and then set
> hw.acpi.thermal.tz0._PSV to something lower than 90C, perhaps much lower
> > to see if it helps, especially if 'crashing' occurs closer to 90C than
> > not, however:
>
> sysctl hw.acpi.thermal.tz0._PSV
> hw.acpi.thermal.tz0._PSV: 90.0C
> sysctl hw.acpi.thermal.user_override
> hw.acpi.thermal.user_override: 0
> sysctl hw.acpi.thermal.user_override
> sysctl hw.acpi.thermal.tz0._PSV=60.0C
> sysctl -a | grep acpi.thermal
> hw.acpi.thermal.min_runtime: 0
> hw.acpi.thermal.polling_rate: 10
> hw.acpi.thermal.user_override: 1
> hw.acpi.thermal.tz0.temperature: 69.0C
> hw.acpi.thermal.tz0.active: -1
> hw.acpi.thermal.tz0.passive_cooling: 1
> hw.acpi.thermal.tz0.thermal_flags: 1
> hw.acpi.thermal.tz0._PSV: 60.0C
> hw.acpi.thermal.tz0._HOT: 95.0C
> hw.acpi.thermal.tz0._CRT: 100.0C
> hw.acpi.thermal.tz0._ACx: -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
> hw.acpi.thermal.tz0._TC1: 2
> hw.acpi.thermal.tz0._TC2: 3
> hw.acpi.thermal.tz0._TSP: 40
>
> I think that's forced the frequency down, Xterm to machine is sluggish
> & powerd -v reports:
> changing clock speed from 99 MHz to 1592 MHz
> load 177%, current freq 99 MHz (11), wanted freq 3184 MHz
> changing clock speed from 99 MHz to 1592 MHz
> load 179%, current freq 99 MHz (11), wanted freq 3184 MHz
> changing clock speed from 99 MHz to 1592 MHz
> load 151%, current freq 99 MHz (11), wanted freq 3184 MHz
Seems to be working. You likely never really want to run this slowly,
even with hiadaptive it's going to take a while to get 'unsluggish', so
even without overheating issues you may want to set debug.cpufreq.lowest
to something vaguely usable, maybe 300MHz or so?
However, I'd definitely try disabling throttling; all those N/8 freqs
may be doing more harm than good in some contexts .. in loader.conf:
hint.acpi_throttle.0.disabled=1 # possibly? also needing:
hint.acpi_throttle.1.disabled=1
which should leave only the 'native' 1592 and 796 rates. Worth a try?
> sysctl hw.acpi.thermal.tz0._HOT=62.0C
> I dont hear or feel fan faster.
> ... just crashed.
!sysctl -d hw.acpi.thermal
[..]
hw.acpi.thermal.tz0.active: cooling is active
hw.acpi.thermal.tz0.passive_cooling: enable passive (speed reduction) cooling
hw.acpi.thermal.tz0.thermal_flags: thermal zone flags
hw.acpi.thermal.tz0._PSV: passive cooling temp setpoint
hw.acpi.thermal.tz0._HOT: too hot temp setpoint (suspend now)
hw.acpi.thermal.tz0._CRT: critical temp setpoint (shutdown now)
[..]
So with a low _HOT it may have tried suspending? _CRT is likely more
useful for a clean shutdown - unless it suspends and resumes cleanly?
> > > Running 80% idle (just a fsck_ufs) I see:
> > > hw.acpi.thermal.tz0.temperature: 67.0C
> > > dev.acpi_hp.0.hdd_temperature: 4
> >
> > 67C isn't really hot on a dual core laptop with 100W rating at 1592MHz.
> > Still, if you had it drop back immediately on idle to 796MHz, you'd be
> > saving about 60W, which may help considerable.
> >
> > I expect you've done the usual check/clean airways, thermal grease etc?
>
> New notebook, used about 2 hours by mother in a clean office,
> then by me ditto, however yes, might be manufacturing error & lack of grease.
> Not opened it yet.
I guess how it runs on Windows may be the only basis for warranty .. or
maybe try some Linux live CD to eliminate any possible FreeBSD issues?
cheers, Ian
Pardon overcommented and overdefensive (not to say paranoid) coding ..
blame my IBM training in 1970 - you never really recover from that!
=======
#!/bin/sh
# temp_t23 smithi v0 7/11/11, tidy 12/11/11, v2 fan read delay 22/11/11
# v3 18/12/11 suspend / resume resets dev.acpi_ibm.0.fan=1 (ie auto)
none=0; slow=1; fast=3 # Thinkpad T23 fan levels (1-2, 3-7 same)
vhtemp=60 # above, immediately fast
hitemp=50 # above twice, fan fast
lotemp=45 # below twice, fan off
hyst=3 # degs hysteresis on transitions
log='/root/temp_t23.log'
echo "`date` temp_t23 start: enabling manual fan control" >>$log
sysctl dev.acpi_ibm.0.fan >/dev/null
[ $? -ne 0 ] && echo "acpi_ibm not loaded?" >>$log && exit 1
t=`sysctl -n hw.acpi.thermal.tz0.temperature`; t=${t%??C}
[ $t -lt 10 -o $t -gt 99 ] && echo "${t}C unbelievable!" >>$log && exit 2
sysctl dev.acpi_ibm.0.fan=0 >/dev/null # turn auto fan control OFF
[ $? -ne 0 ] && echo "can't disable auto fan control!" >>$log && exit 3
[ "$1" ] && sleep=$1 || sleep=10 # seconds between samples
last_t=$vhtemp # if starting hot, fan fast
last_level=9 # set fan immediately
level=$slow
rc=0; done=0; trap "done=1" int quit term
while [ $done -eq 0 ]; do
if [ $t -gt $hitemp ]; then # twice or much, max fan (~4800rpm)
[ $last_t -gt $hitemp -o $t -ge $vhtemp ] && level=$fast
elif [ $t -lt $lotemp ]; then # below twice, fan off
[ $last_t -lt $lotemp ] && level=$none
else
# between: after hysteresis set fan to min (~2400rpm)
if [ $level -eq $fast ]; then # falling
[ $t -le $(($hitemp - $hyst)) ] && level=$slow
elif [ $level -eq $none ]; then # rising
[ $t -ge $(($lotemp + $hyst)) ] && level=$slow
fi
fi
if [ $level -ne $last_level ]; then
sysctl dev.acpi_ibm.0.fan_level=$level >/dev/null
[ $? -ne 0 ] && echo "can't set level $level" >>$log && break
# for now, log all level changes # v2 fan_speed settle delay
echo -n "`date` ${t}C fan_level -> $level " >>$log; sleep 5
echo "`sysctl -n dev.acpi_ibm.0.fan_speed`rpm" >>$log
else
sleep $sleep # catch ^C or term; done=1
fi
if [ `sysctl -n dev.acpi_ibm.0.fan` -ne 0 ]; then # v3
echo -n "`date` resume: reenable manual fan at ${t}C" >>$log
sysctl dev.acpi_ibm.0.fan=0 >/dev/null # back to manual fan
[ $? -ne 0 ] && echo " .. FAILED!" >>$log && rc=3 && break
echo >>$log
fi
last_level=$level
last_t=$t
t=`sysctl -n hw.acpi.thermal.tz0.temperature`; t=${t%??C}
done
if [ $done -ne 1 ]; then
echo "temp_t23 error: done=$done ${t}C level=$level" >>$log
[ $rc -eq 0 ] && rc=7 # not that anybody's listening :)
fi
echo "`date` temp_t23 end: reenabling auto fan control at ${t}C" >>$log
sysctl dev.acpi_ibm.0.fan=1 >/dev/null # auto fan ON
[ $? -ne 0 ] && echo " REENABLING AUTO FAN FAILED!" >>$log && rc=9
trap - int quit term
exit $rc
=======
Thu Jan 5 08:03:23 EST 2012 48C fan_level -> 1 2298rpm
Thu Jan 5 08:27:20 EST 2012 44C fan_level -> 0 0rpm
Thu Jan 5 08:30:35 EST 2012 48C fan_level -> 1 2310rpm
Thu Jan 5 08:45:01 EST 2012 44C fan_level -> 0 0rpm
Thu Jan 5 08:47:26 EST 2012 48C fan_level -> 1 2286rpm
Thu Jan 5 14:23:35 EST 2012 52C fan_level -> 3 4936rpm
Thu Jan 5 14:25:00 EST 2012 47C fan_level -> 1 2361rpm
Thu Jan 5 14:43:27 EST 2012 51C fan_level -> 3 5026rpm
Thu Jan 5 14:45:32 EST 2012 47C fan_level -> 1 2347rpm
Thu Jan 5 15:02:58 EST 2012 52C fan_level -> 3 4952rpm
Thu Jan 5 15:05:23 EST 2012 47C fan_level -> 1 2357rpm
Thu Jan 5 15:22:19 EST 2012 52C fan_level -> 3 5010rpm
Thu Jan 5 15:25:05 EST 2012 47C fan_level -> 1 2326rpm
More information about the freebsd-mobile
mailing list