32-bit powerpc FreeBSD head -r317820 panic example: Example showing register r10 having been trashed during set_affinity's time frame
Mark Millard
markmi at dsl-only.net
Wed Jun 7 04:49:45 UTC 2017
Summary of the analysis below: Something
trashed the r10 value during set_affinity,
effectively replacing a struct td_sched *
value with what happened to be a struct
thread * value.
That in turn lead to what should have been
a cpuid (for one of 4 CPUs) to instead be
a struct td_sched * value.
And that in turn lead to a data storage
interrupt from the indexing into
cpuid_to_pcpu being way out of bounds.
Context: 32-bit powerpc head -r317820 used on
a old PowerMac G5 so-called "Quad Core".
The analysis from a panic's dump and from
pictures of ddb output and using the kernel
debug file. . .
I got a panic that it looked like I could back calculate
a ways to see what was odd somewhat earlier. This is
that back calculation sequence.
This will start in ipi_cpu where the panic occurred
and progress backwards into sched_affinity.
void
ipi_cpu(int cpu, u_int ipi)
{
ipi_send(cpuid_to_pcpu[cpu], ipi);
}
was:
(kgdb) x/90i ipi_cpu
0x8eb0dc <ipi_cpu>: stwu r1,-32(r1)
0x8eb0e0 <ipi_cpu+4>: mflr r0
0x8eb0e4 <ipi_cpu+8>: stw r30,24(r1)
0x8eb0e8 <ipi_cpu+12>: stw r31,28(r1)
0x8eb0ec <ipi_cpu+16>: stw r0,36(r1)
0x8eb0f0 <ipi_cpu+20>: mr r31,r1
0x8eb0f4 <ipi_cpu+24>: bcl- 20,4*cr7+so,0x8eb0f8 <ipi_cpu+28>
0x8eb0f8 <ipi_cpu+28>: mflr r30
0x8eb0fc <ipi_cpu+32>: lwz r0,-32(r30)
0x8eb100 <ipi_cpu+36>: add r30,r0,r30
0x8eb104 <ipi_cpu+40>: rlwinm r3,r3,2,0,29
0x8eb108 <ipi_cpu+44>: lwz r9,-32756(r30)
0x8eb10c <ipi_cpu+48>: lwzx r3,r3,r9
0x8eb110 <ipi_cpu+52>: bl 0x8eb018 <ipi_send>
0x8eb114 <ipi_cpu+56>: lwz r11,0(r1)
0x8eb118 <ipi_cpu+60>: lwz r0,4(r11)
0x8eb11c <ipi_cpu+64>: mtlr r0
0x8eb120 <ipi_cpu+68>: lwz r30,-8(r11)
0x8eb124 <ipi_cpu+72>: lwz r31,-4(r11)
0x8eb128 <ipi_cpu+76>: mr r1,r11
0x8eb12c <ipi_cpu+80>: blr
0x8eb130 <ipi_cpu+84>: .long 0x43ce40
The point of failure (srr0 value) is:
0x8eb10c <ipi_cpu+48>: lwzx r3,r3,r9
(Note: This is a form of r3:=MEM(r3+r9,4).
Luckily r3 was not updated because of the
bad r3+r9 value.)
Note that the only prior instruction modifying
r3 in ipi_cpu is:
0x8eb104 <ipi_cpu+40>: rlwinm r3,r3,2,0,29
The "show reg r3" result was: 0x51f2880. So
prior to the shift by 2 and masking off of
the least significant 2 bits its value was:
r3=0x147CA20
I will note that the 0x51f2880 + the
"show reg r9" value 0xf2c9fc (i.e.,
&cpuid_to_pcpu) produces: 0x611f27c
-- and the exception was reported as
being for virtual address 0x611f27c .
By the source code structure r3 was
supposed to be one of the values: 0x0,
0x1, 0x2 or 0x3 instead. (The PowerMac G5
"Quad Core" has 4 cpus.)
With the "show reg r4" result being 0x1
it means that the call to ipi_cpu was
apparently (by values passed):
ipi_cpu(0x147CA20, 0x1)
"Show allpcpu" reported that cpuid=3 had:
curthread = 0x147ca20: pid 11 tid 100006 "idle: cpu3"
idlethread = 0x147ca20: did 100006 "idle: cpu3"
where 0x147d000 (for cpuid 2) has 0x147d008
holding a 0x147ca20 value.
So somehow a thread address for cpuid 3 was
given to ipi_cpu instead of 0x3 .
The backtrace shows that the caller of
ipi_cpu was sched_affinity:
void
sched_affinity(struct thread *td)
{
#ifdef SMP
struct td_sched *ts;
THREAD_LOCK_ASSERT(td, MA_OWNED);
ts = td_get_sched(td);
if (THREAD_CAN_SCHED(td, ts->ts_cpu))
return;
if (TD_ON_RUNQ(td)) {
sched_rem(td);
sched_add(td, SRQ_BORING);
return;
}
if (!TD_IS_RUNNING(td))
return;
/*
* Force a switch before returning to userspace. If the
* target thread is not running locally send an ipi to force
* the issue.
*/
td->td_flags |= TDF_NEEDRESCHED;
if (td != curthread)
ipi_cpu(ts->ts_cpu, IPI_PREEMPT);
#endif
}
So the implication is that ts->ts_cpu had
the value 0x147ca20 instead of the value 3
at the time of the call sequence.
sched_affinity was:
0x535ab8 <sched_affinity>: stwu r1,-32(r1)
0x535abc <sched_affinity+4>: mflr r0
0x535ac0 <sched_affinity+8>: stw r29,20(r1)
0x535ac4 <sched_affinity+12>: stw r30,24(r1)
0x535ac8 <sched_affinity+16>: stw r31,28(r1)
0x535acc <sched_affinity+20>: stw r0,36(r1)
0x535ad0 <sched_affinity+24>: mr r31,r1
0x535ad4 <sched_affinity+28>: mr r29,r3
0x535ad8 <sched_affinity+32>: lwz r0,0(r3)
0x535adc <sched_affinity+36>: addi r10,r3,808
0x535ae0 <sched_affinity+40>: lwz r9,8(r10)
0x535ae4 <sched_affinity+44>: lwz r11,48(r3)
0x535ae8 <sched_affinity+48>: rlwinm r0,r9,29,3,29
0x535aec <sched_affinity+52>: clrlwi r9,r9,27
0x535af0 <sched_affinity+56>: lwzx r0,r11,r0
0x535af4 <sched_affinity+60>: sraw r0,r0,r9
0x535af8 <sched_affinity+64>: andi. r9,r0,1
0x535afc <sched_affinity+68>: bne- 0x535b4c <sched_affinity+148>
0x535b00 <sched_affinity+72>: lwz r0,652(r3)
0x535b04 <sched_affinity+76>: cmpwi cr7,r0,3
0x535b08 <sched_affinity+80>: bne+ cr7,0x535b20 <sched_affinity+104>
0x535b0c <sched_affinity+84>: bl 0x5341c0 <sched_rem>
0x535b10 <sched_affinity+88>: mr r3,r29
0x535b14 <sched_affinity+92>: li r4,0
0x535b18 <sched_affinity+96>: bl 0x53583c <sched_add>
0x535b1c <sched_affinity+100>: b 0x535b4c <sched_affinity+148>
0x535b20 <sched_affinity+104>: cmpwi cr7,r0,4
0x535b24 <sched_affinity+108>: bne- cr7,0x535b4c <sched_affinity+148>
0x535b28 <sched_affinity+112>: lwz r0,156(r3)
0x535b2c <sched_affinity+116>: oris r0,r0,1
0x535b30 <sched_affinity+120>: stw r0,156(r3)
0x535b34 <sched_affinity+124>: mr r0,r2
0x535b38 <sched_affinity+128>: cmpw cr7,r3,r0
0x535b3c <sched_affinity+132>: beq- cr7,0x535b4c <sched_affinity+148>
0x535b40 <sched_affinity+136>: lwz r3,8(r10)
0x535b44 <sched_affinity+140>: li r4,1
0x535b48 <sched_affinity+144>: bl 0x8eb0dc <ipi_cpu>
0x535b4c <sched_affinity+148>: lwz r11,0(r1)
0x535b50 <sched_affinity+152>: lwz r0,4(r11)
0x535b54 <sched_affinity+156>: mtlr r0
0x535b58 <sched_affinity+160>: lwz r29,-12(r11)
0x535b5c <sched_affinity+164>: lwz r30,-8(r11)
0x535b60 <sched_affinity+168>: lwz r31,-4(r11)
0x535b64 <sched_affinity+172>: mr r1,r11
0x535b68 <sched_affinity+176>: blr
0x535b6c <sched_affinity+180>: .long 0x7e7f18
The only call executed by sched_affinity was to
ipi_cpu given what is inlined and the fact that
it made it to do the ipi_cpu bl instruction.
Since ipi_cpu leaves various registers alone
some register values from sched_affinity were
preserved for "show reg" use. r10 is an example
in:
0x535ad4 <sched_affinity+28>: mr r29,r3
. . .
0x535adc <sched_affinity+36>: addi r10,r3,808
. . .
0x535b40 <sched_affinity+136>: lwz r3,8(r10)
0x535b44 <sched_affinity+140>: li r4,1
0x535b48 <sched_affinity+144>: bl 0x8eb0dc <ipi_cpu>
(r29 is another register value that is preserved
in what sched_affinity called. r29 reports what
r3 was when sched_affinity was called: 0x147d000,
which makes sense. Also 808 is in decimal notation.)
That sched_affinity code corresponds to (r10 for
ts below):
sched_affinity(struct thread *td)
. . .
struct td_sched *ts;
. . .
ts = td_get_sched(td);
. . .
ipi_cpu(ts->ts_cpu, IPI_PREEMPT);
where "show reg r10" reported: 0x147d000, which
matched the "show allpcpu" for cpuid=2:
curthread = 0x147d000: pid11 tid 100005 "idle: cpu2"
But it should not match that. . .
r10 should have been 0x147d328 (0x147d00+0x328)
but is reported as 0x147d000 and the behavior
reported for ipi_cpu agrees with 0x147d000:
8(r10) then extracts a 0x0147ca20 value from
0x147d008 instead of a 0x0 from 0x147d328 .
So something trashed r10 before:
0x535b40 <sched_affinity+136>: lwz r3,8(r10)
and so effectively replaced ts with the td value.
Trashing r3 so that r3+808==0x147d000 when
0x535adc executes seems unlikely but is a
technical possibility for how r10 ends up
trashed.
Other notes:
Some of the details not shown above are:
static __inline __pure2 struct td_sched *
td_get_sched(struct thread *td)
{
return ((struct td_sched *)&td[1]);
}
and:
/*
* Thread scheduler specific section. All fields are protected
* by the thread lock.
*/
struct td_sched {
struct runq *ts_runq; /* Run-queue we're queued on. */
short ts_flags; /* TSF_* flags. */
int ts_cpu; /* CPU that we have affinity for. */
int ts_rltick; /* Real last tick, for affinity. */
int ts_slice; /* Ticks of slice remaining. */
u_int ts_slptime; /* Number of ticks we vol. slept */
u_int ts_runtime; /* Number of ticks we were running */
int ts_ltick; /* Last tick that we were running on */
int ts_ftick; /* First tick that we were running on */
int ts_ticks; /* Tick count */
#ifdef KTR
char ts_name[TS_NAME_LEN];
#endif
};
where td_sched is a suffix after a "struct thread",
such as the space reserved in:
struct thread0_storage {
struct thread t0st_thread;
uint64_t t0st_sched[10];
};
and checked by:
_Static_assert(sizeof(struct thread) + sizeof(struct td_sched) <=
sizeof(struct thread0_storage),
"increase struct thread0_storage.t0st_sched size");
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-ppc
mailing list