reproducible panic in netisr
Robert Watson
rwatson at FreeBSD.org
Wed Aug 5 23:17:12 UTC 2009
On Tue, 4 Aug 2009, Navdeep Parhar wrote:
>>> This occurs on today's HEAD + some unrelated patches. That makes it
>>> 8.0BETA2+ code. I haven't tried older builds.
>>
>> We have finally been able to reproduce this ourselves yesterday and
>
> Well, it happens every single time on all of my amd64 machines. After I'd
> already sent my email I noticed that the netisr mutex has an odd address
> (pun intended :-))
>
> m=0xffffffff8144d867
Heh, indeed. We just spotted the same result here. In this case it's causing
a panic because it leads to a non-atomic read due to mtx_lock spanning a cache
line boundary, followed shortly by a panic because it's not a valid thread
pointer when it's dereferenced, as we get a fractional pointer.
> It's a bit unusual for the mutex struct to start at a completely unaligned
> address. I hope things are better on sparc64 etc., not everyone is as
> forgiving as amd64.
amd64 isn't as forgiving either, it turns out. :-)
> The mutex led me to some DPCPU stuff that I didn't quite get.
>
> (kgdb) p/x dpcpu_off
> $2 = {0x8407d7, 0xffffff807f4037d7, 0x0 <repeats 30 times>}
> (kgdb) p dpcpu
> $3 = (void *) 0xffffff8000010000
> (kgdb) p &__start_set_pcpu
> $4 = (uintptr_t **) 0xffffffff80c0c829
> (kgdb) p/x 0xffffff8000010000 - 0xffffffff80c0c829
> $5 = 0xffffff807f4037d7
>
> It's not clear why we prefer to store offsets from DPCPU_START, instead of
> the base address of the dpcpu area directly. On amd64, the dpcpu area for
> cpu 0 is above kernbase (immediately after kernbase + thread0's stack).
> For the other CPUs it's below kernbase. This makes the pointer arithmetic
> that calculates offsets more "interesting."
>
> Why have a dpcpu_off[] instead of a dpcpu_base[]?
Each field in DPCPU is named with respect to the start of a "master" dpcpu
copy, which holds the static initialization. This makes the per-CPU name:
(&master_name_for_variable - DPCPU_START) + per-cpu-base
What Jeff has done is factor out the DPCPU_START subtraction, since it's a
constant subtraction across all DPCPU use, and do it once when calculating
dpcpu_off. This should all be fine, the question is why we're losing the
alignment during linking of the kernel. netisr is linked into the base
kernel, so I guess it's some problem with the way the linker set is being laid
out at compile-time. I expect we may have a similar issue with the run-time
allocation of DPCPU space as well.
Robert N M Watson
Computer Laboratory
University of Cambridge
More information about the freebsd-current
mailing list