powerd to use sysctl to import temps to drop freq to avoid heat
crash
Ian Smith
smithi at nimnet.asn.au
Sat Jan 7 05:18:55 UTC 2012
On Fri, 6 Jan 2012, Kevin Oberman wrote:
> On Thu, Jan 5, 2012 at 2:40 AM, Ian Smith <smithi at nimnet.asn.au> wrote:
>
> > On Thu, 5 Jan 2012, Ian Smith wrote:
[..]
> > This post:
> > http://lists.freebsd.org/pipermail/freebsd-acpi/2008-February/004521.html
> > points to this PR:
> > http://www.freebsd.org/cgi/query-pr.cgi?pr=bin%2F120336&cat=
> > which was closed with reference to this post:
> > http://docs.FreeBSD.org/cgi/mid.cgi?1203126071.833.19.camel
> >
> > Useless trying to apply anything like this to recent powerd sources,
> > though. Good luck with getting its fan to go, or a warranty claim!
> First of all, I am highly suspicious of hardware issues. Like Ian says, the
> temperature measured in the chip and quite a bit higher than those measured
> on the motherboard. Beyond that, if the die temperature gets too high, all
> modern CPUs kill the power supply and the system dies almost instantly. I
> suspect this may be what is happening to you due to poor heat transfer from
> the die to the heatsink from poor attachment of the heatsink or a flaw in
> the CPU thermal connection between the die and the thermal transfer plate
> on the top of the case. (The former is FAR more likely.)
I'm agreed with your analysis of likely causes Kevin, but in this case
temps measured on the chip are (at that time) ~9C _lower_ than whatever
sensor provides hw.acpi.thermal.tz0.temperature - the latter being what
ACPI takes as here CPU temperature. Quoting Julian's one set of data:
dev.amdtemp.0.sensor0.core0: 51.0C
dev.amdtemp.0.sensor0.core1: 49.0C
dev.amdtemp.0.sensor1.core0: 58.0C
dev.amdtemp.0.sensor1.core1: 56.0C
dev.cpu.0.temperature: 57.0C
dev.cpu.1.temperature: 57.0C
hw.acpi.thermal.tz0.temperature: 66.0C
According to amdtemp(4) (and amdtemp.c), dev.cpu.N.temperature is "Max
of sensor 0 / 1", although that doesn't entirely tally with above data.
Either way, hw.acpi.thermal.tz0.temperature is 8-9C higher, and it's
that temperature that will be compared to _PSV, _HOT and _CRT settings.
>> hw.acpi.thermal.tz0._PSV: 90.0C
>> hw.acpi.thermal.tz0._HOT: 95.0C
>> hw.acpi.thermal.tz0._CRT: 100.0C
In other words, the CPU may not be as hot internally as it appears, re
Julian's concern about damaging more than his lap :) That doesn't help
the problem of course - I still think the fan's a prime suspect - but it
may be indicative of why the machine is crashing.
Once it hits 95C (_HOT), which is then perhaps only 85C or so on-core,
it's going to try to suspend - which likely won't work anyway - which
would surely appear as 'crashing'. On any machine that won't suspend
(and resume), it makes little sense to allow _HOT to be reached when
_CRT at least should provide a clean shutdown, disk/s sync'd etc.
So Julian, I think setting _HOT higher than _CRT (to get it out of the
way) might help in the meantime to at least get a clean(er) shutdown.
> Beyond this, I don't think the OS should be trying to deal with this. All
> "modern" CPUs, Intel or AMD, support some form of TCC and, if not
> interfered with, it will slow the clock in increments of 12.5% as needed
> to keep the temperature below PSV. This is done in a combination of
> hardware and BIOS and is supported by FreeBSD, but FreeBSD also tries to
> use it to do power management and as test both by myself and, more recently
> by mav@, this is a bad idea. Both throttling and TCC should be disabled and
> I would love to see them completely removed from power management.
It may be they're still of some/more use on some older kit? What about
offering a patch to the default loader hints, just defaulting them off,
and see who bites? :)
> I have read papers which make it clear that only EST and sleep (Cx) states
> are really useful for power management and that Cx is by far and away the
> most significant, but enabling deep sleep states can cause the system to
> lock up when combined with TCC or throttling.
>
> If you want to keep a system cool, add:
> performance_cx_lowest="LOW"
> economy_cx_lowest="LOW"
> to /etc/rc.conf and disable TCC and throttling in /boot/loader.conf.
In Julian's case there's only C1/0, so "LOW" is the same as "HIGH";
there's no p4tcc on AMD; yes throttling {c,sh}ould be disabled, and
powernow is pretty much AMD's equivalent of Intel EST, controlling both
P-state frequency and core voltage.
http://wiki.freebsd.org/TuningPowerConsumption has this to say about the
(apparently desirable) AMD C1E state(/s):
on FreeBSD 8 - "As soon as entering C1E on AMD CPUs may result in
unexpected and uncontrolled entering C3 and resulting local APIC timer
stop, FreeBSD 8.x blocks C1E functionality completely."
while on FreeBSD 9 - "On AMD CPUs FreeBSD 9.x blocks C1E only when local
APIC timer is used. If the local APIC timer was ever used since boot,
C1E will be blocked till the next reboot. You may want to force some
other timer to be used in order to allow C1E to work."
But all of that is icing on the cake .. if the fan/s and heatsinks are
working, it should only be in far more extreme cases than 'normal' use
that this machine should be falling over so.
cheers, Ian
> --
> R. Kevin Oberman, Network Engineer
> E-mail: kob6558 at gmail.com
More information about the freebsd-mobile
mailing list