[Bug 230857] loading carp module panic i386 kernel (VIMAGE related)
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Wed Oct 10 15:11:59 UTC 2018
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230857
Bjoern A. Zeeb <bz at FreeBSD.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |dim at FreeBSD.org
Keywords| |toolchain
--- Comment #5 from Bjoern A. Zeeb <bz at FreeBSD.org> ---
I'll start describing the problem from a reduced piece of code, which is not as
big as carp, replicating carpstats, assuming VIMAGE is on:
--------
#include <sys/param.h>
#include <sys/kernel.h>
#include <sys/systm.h>
#include <sys/types.h>
#include <sys/mbuf.h>
#include <sys/counter.h>
#include <net/vnet.h>
struct xstats {
uint64_t foo1;
uint64_t bar1;
uint64_t baz1;
uint64_t mad1;
uint64_t foo2;
uint64_t bar2;
uint64_t baz2;
uint64_t mad2;
uint64_t foo3;
uint64_t bar3;
uint64_t baz3;
uint64_t mad3;
uint64_t foo4;
uint64_t bar4;
uint64_t baz4;
uint64_t mad4;
};
VNET_PCPUSTAT_DEFINE(struct xstats, xstats);
VNET_PCPUSTAT_SYSINIT(xstats);
--------
This unrolls into:
1 struct _hack;
2 counter_u64_t vnet_entry_xstats[sizeof(struct xstats) / sizeof(uint64_t)]
__attribute__((__section__("set_vnet"))) __attribute__((__used__));
3
4
5 static void
6 vnet_xstats_init(const void *unused)
7 {
8 do {
9 for (int i = 0; i < (sizeof((*(__typeof(vnet_entry_xstats)
*) (((((__curthread())->td_vnet))->vnet_data_base) + (uintptr_t) &
vnet_entry_xstats))) / sizeof(counter_u64_t)); i++)
10 ((*(__typeof(vnet_entry_xstats) *)
(((((__curthread())->td_vnet))->vnet_data_base) + (uintptr_t) &
vnet_entry_xstats)))[i] = counter_u64_alloc((0x0002));
11 } while (0);
12 }
In essance what looks so complicated is (on a per vnet base):
void *array[16];
for (i=0 ; i<16; i++)
array[i] = alloc();
The above (with the vnet bits), on i386, is translated into:
00000340 <vnet_xstats_init>:
340: 55 push %ebp
341: 89 e5 mov %esp,%ebp
343: 56 push %esi
344: 50 push %eax
345: be c0 ff ff ff mov $0xffffffc0,%esi
34a: 90 nop
34b: 90 nop
34c: 90 nop
34d: 90 nop
34e: 90 nop
34f: 90 nop
350: c7 04 24 02 00 00 00 movl $0x2,(%esp)
357: e8 fc ff ff ff call 358 <vnet_xstats_init+0x18>
358: R_386_PC32 counter_u64_alloc
35c: 64 8b 0d 00 00 00 00 mov %fs:0x0,%ecx
363: 8b 89 1c 03 00 00 mov 0x31c(%ecx),%ecx
369: 8b 49 1c mov 0x1c(%ecx),%ecx
36c: 89 84 31 88 14 00 00 mov %eax,0x1488(%ecx,%esi,1)
36f: R_386_RELATIVE *ABS*
373: 83 c6 04 add $0x4,%esi
376: 75 d8 jne 350 <vnet_xstats_init+0x10>
378: 83 c4 04 add $0x4,%esp
37b: 5e pop %esi
37c: 5d pop %ebp
37d: c3 ret
Now here's the problem:
__start_set_vnet is 0x1448
__stop_set_vnet is 0x1488
The problem is that the code generated goes like this:
%esi = -64
repeat:
%eax = alloc()
We do all the curthread->td_vnet->vnet_data_base in %ecx and then do
mov %eax,0x1488(%ecx,%esi,1)
Which is:
move the alloc() result into curthread->td_vnet->vnet_data_base + 0x1488 +
(1 * -64)
Now 0x1488 - 64 gets us to the beginning of the array[] or array[0].
%esi += 4
So next iteration it'll be 0x1488 - 60 or array[1] ... and so on.
while %esi != 0 goto repeat;
It's an easy way to to the for(i=0; i<16; i++) loop.
The problem is that 0x1488 was not relocated.
When we are going over the relocations and calling into elf_relocaddr() the
check for VIMAGE is:
if (x >= ef->vnet_start && x < ef->vnet_stop) {
In our case we have an *ABS* R_386_RELATIVE of 0x1488, which is ==
ef->vnet_stop but not < ef->vnet_stop.
The real problem is that with non-simple-types the code generated with an
absolute relocation might be just outside the range. We cannot adjust the
check as there might be a simple-type following in the next section which would
then be relocated.
For CARP this showed up because the VNET_PCPUSTAT_DEFINE() went into the VNET
section the last and hence the problem showed up. If there was any other, say
int V_Foo after it, we'd never have noticed. We cannot fully control the
order in which symbols go into the section, or at least not to the extend we'd
like to, so we cannot make sure there's always a char at the end.
The only solution arichardson and I agreed to would be to add 1 byte of padding
to the end of the section.
Using BYTE(1) in a linker script however would always create a set_vnet section
in all kernel modules even if they do not have any virtualized objects.
Using a
https://sourceware.org/binutils/docs-2.31/ld/Output-Section-Discarding.html#Output-Section-Discarding
. = . + <0|sym>; should not create the section.
This leads to a multi-stage solution:
(1) add a symbol based on SIZEOF(set_vnet) which is either 0 or 1. SIZEOF()
does not create a section if it does not exist.
(2) pass the *(set_vnet) through and then increment "." by the amount of the
symbol from (1); which if there is a section it will increase the section by 1
byte and any non-simple-type objects resulting in absolute relocations on the
boundary should also be relocated. If there is no section . = . + 0; will not
create the empty section.
The problem is that the symbol from (1) relies on the section size to be known
which we don't seem to in the first pass.
The second problem is that the symbol from (1) is not immediately visible
(despite it should me; XXX sourceware reference for that)
So we really have to do multiple linker invocations with different linker
scripts.
With ld.bfd an empty set_vnet section will not show up, only the symbol from
(1) will be left behind and that's fine.
With ld.lld 6 it will also leave an empty set_vnet section behind; emaste
points me to https://reviews.llvm.org/rL325887 where this had been under
discussion already with dim, him, and lld developers. It's not beautiful, but
should not harm either (XXX to be tested).
We need to exclude "firmware" from the dance and for now make it i386 specific.
The entire problem described above for VNET also applies to DPCPU (set_pcpu).
I'll attached a preliminary patch which seems to hack this together and once I
added a bit more description etc to the files, I'll put it into Phab as well
(if you have comments earlier let me know).
I hope this all kind-of makes any sense ;-)
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the freebsd-toolchain
mailing list