kern/121433: [cpufreq] kern_cpu.c's logic error leads to
spontaneous disabling of passive cooling
Eugene Grosbein
eugen at kuzbass.ru
Thu Mar 6 17:00:04 UTC 2008
>Number: 121433
>Category: kern
>Synopsis: [cpufreq] kern_cpu.c's logic error leads to spontaneous disabling of passive cooling
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: freebsd-bugs
>State: open
>Quarter:
>Keywords:
>Date-Required:
>Class: sw-bug
>Submitter-Id: current-users
>Arrival-Date: Thu Mar 06 17:00:03 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator: Eugene Grosbein
>Release: FreeBSD 6.3-PRERELEASE i386
>Organization:
Svyaz-Service JSC
>Environment:
System: FreeBSD 6.3-PRERELEASE, Pentium-4 2.0Ghz
>Description:
I've 1U/unipocessor FreeBSD 6.3-PRERELEASE server having inadequate
active cooling that leads to CPU overheating. The server is remote and while
good cooling is being prepared, I decided to use passive cooling feature
of acpi_thermal(4). It uses p4tcc here and really helps
to keep CPU temperature in bounds but there is annoying bug:
very often (many times per hour) the acpi_thermal(4)
disables passive cooling with a message:
failed to set new freq, disabling passive cooling
So I need to use cron to (re)enable passive cooling ones a minute
to keep it running.
I've tracked this down to src/sys/kern/kern_cpu.c,
function cf_get_method():
1) src/sys/dev/acpica/acpi_thermal.c, function acpi_tz_cooling_thread()
calls acpi_tz_cpufreq_update() from same file;
2) acpi_tz_cpufreq_update() calls CPUFREQ_GET() that takes us to
src/sys/kern/kern_cpu.c, cf_get_method();
3) cf_get_method() has the following code:
/*
* Reacquire the lock and search for the given level.
*
* XXX Note: this is not quite right since we really need to go
* through each level and compare both absolute and relative
* settings for each driver in the system before making a match.
* The estimation code below catches this case though.
*/
CF_MTX_LOCK(&sc->lock);
for (n = 0; n < numdevs && curr_set->freq == CPUFREQ_VAL_UNKNOWN; n++) {
if (!device_is_attached(devs[n]))
continue;
error = CPUFREQ_DRV_GET(devs[n], &set);
if (error)
continue;
for (i = 0; i < count; i++) {
if (CPUFREQ_CMP(set.freq, levels[i].total_set.freq)) {
sc->curr_level = levels[i];
break;
}
}
}
Note that error value is not cleaned after this cycle.
It happens to be ENXIO after the cycle in my case.
Later code successfully reports:
CF_DEBUG("get estimated freq %d\n", curr_set->freq);
(curr_set->freq always happens to be max value of CPU frequency here)
Then it does 'return (error);' with value ENXIO propagated
from the cycle shown above.
4) acpi_tz_cpufreq_update() propagates ENXIO
to acpi_tz_cooling_thread() that disables passive cooling.
>How-To-Repeat:
Just use uniprocessor Pentium-4 system with heavy constant CPU load,
acpi_thermal/cpufreq/p4tcc and tune acpi_thermal so passive cooling
gets used. Here is my /etc/sysctl.conf:
debug.cpufreq.lowest=1246
#debug.cpufreq.verbose=1
hw.acpi.thermal.user_override=1
hw.acpi.thermal.tz0.passive_cooling=1
hw.acpi.thermal.tz0._PSV=70C
hw.acpi.thermal.tz0._CRT=75C
>Fix:
Unknown. Perhaps, just clear errno after the code cited above?
As workaround, I've patched acpi_thermal(4) to not disable
passive cooling when acpi_tz_cpufreq_update() returns ENXIO,
that works for me.
Eugene Grosbein
>Release-Note:
>Audit-Trail:
>Unformatted:
More information about the freebsd-bugs
mailing list