Problem with IPMI KCS driver
Anton Yuzhaninov
citrin at citrin.ru
Fri Sep 28 09:54:45 UTC 2012
On 29.08.2012 16:25, John Baldwin wrote:
> On Wednesday, August 29, 2012 5:36:43 am Anton Yuzhaninov wrote:
>> We use servers witch motherboard Supermicro X8DTT-H and meet with such problem:
>> when watchdogd started, server is rebooted by IPMI watchdog several times per week.
>>
>> After some debugging I've found, that sometimes IPMI driver entered endless
>> loop, and watchdogd have no chances to reset watchdog timer.
>> In such situation top show:
>>
>> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
>> ...
>> 113 root -16 - 0K 16K CPU4 4 17:18 99.17% ipmi0: kcs
>>
>> Endless loop located in file /sys/dev/ipmi/ipmi_kcs.c and function
>> kcs_wait_for_obf():
>>
>> int status, start = ticks;
>>
>> status = INB(sc, KCS_CTL_STS);
>> if (state == 0) {
>> /* WAIT FOR OBF = 0 */
>> while (ticks - start< MAX_TIMEOUT&& status& KCS_STATUS_OBF) {
>> DELAY(100);
>> status = INB(sc, KCS_CTL_STS);
>> }
>> } else {
>> /* WAIT FOR OBF = 1 */
>> while (ticks - start< MAX_TIMEOUT&&
>> !(status& KCS_STATUS_OBF)) {
>> DELAY(100);
>> status = INB(sc, KCS_CTL_STS);
>> }
>> }
>>
>> It seems to be, that this loop intended to run no more than MAX_TIMEOUT ticks.
>> but by some reason this timeout does not works and loop runs until reboot.
>>
>> Questions:
>> 1. Is it correct to check ticks to implement timeout here?
>> 2. how to fix this timeout?
>
> Hmm. Can you try this:
>
> Index: kern/kern_clock.c
> ===================================================================
> --- kern/kern_clock.c (revision 239819)
> +++ kern/kern_clock.c (working copy)
> @@ -382,7 +382,7 @@
> int stathz;
> int profhz;
> int profprocs;
> -int ticks;
> +volatile int ticks;
> int psratio;
>
> static DPCPU_DEFINE(int, pcputicks); /* Per-CPU version of ticks. */
> @@ -469,7 +469,7 @@
> hardclock(int usermode, uintfptr_t pc)
> {
>
> - atomic_add_int((volatile int *)&ticks, 1);
> + atomic_add_int(&ticks, 1);
> hardclock_cpu(usermode);
> tc_ticktock(1);
> cpu_tick_calibration();
> Index: sys/kernel.h
> ===================================================================
> --- sys/kernel.h (revision 239819)
> +++ sys/kernel.h (working copy)
> @@ -63,7 +63,7 @@
> extern int stathz; /* statistics clock's frequency */
> extern int profhz; /* profiling clock's frequency */
> extern int profprocs; /* number of process's profiling */
> -extern int ticks;
> +extern volatile int ticks;
>
> #endif /* _KERNEL */
>
>
With
extern volatile int ticks
Infinite loop repeated not so often, as before, but still repeated.
Symptoms is same:
$ ps -ax -o pid,comm,wchan,state,\%cpu | grep ipmi
113 ipmi0: kcs - RL 100.0
1317 watchdogd ipmire Ds 0.0
DDB trace for pid 113:
Tracing pid 113 tid 100359 td 0xffffff0007913470
cpustop_handler() at cpustop_handler+0x37
ipi_nmi_handler() at ipi_nmi_handler+0x30
trap() at trap+0x345
nmi_calltrap() at nmi_calltrap+0x8
--- trap 0x13, rip = 0xffffffff809c6e64, rsp = 0xffffffff80fd1ec0, rbp =
0xffffff88425d4b30 ---
DELAY() at DELAY+0x64
kcs_wait_for_obf() at kcs_wait_for_obf+0xb6
kcs_read_byte() at kcs_read_byte+0x7d
kcs_loop() at kcs_loop+0x372
fork_exit() at fork_exit+0x135
fork_trampoline() at fork_trampoline+0xe
I can type cont from ddb, wait some time, enter to ddb - trace for pid 113 will
be same.
kcs_wait_for_obf() at kcs_wait_for_obf+0xb6 point to
/usr/src/sys/dev/ipmi/ipmi_kcs.c:94
91 while (ticks - start < MAX_TIMEOUT &&
92 !(status & KCS_STATUS_OBF)) {
93 DELAY(100);
94 status = INB(sc, KCS_CTL_STS);
95 }
--
Anton Yuzhaninov
More information about the freebsd-stable
mailing list