double fault on 10.3-Stable i386 during installworld

Andreas Longwitz longwitz at
Sun Nov 5 16:24:26 UTC 2017

Thanks for answer, I am now sure the reason for the double fault is not
a FreeBSD problem, it is a CPU problem.

>> On the stack we have
>> 0xe437faa0:    0x00000000  R7:0xc0bc051c     0x00000020     0x00010007
>> so there is an exception on the instruction "movl  PCB_CR3(%edx),%eax"
>> in function cpu_switch(). The next stack entries indicates a lot of page
>> faults, but the "double fault" happens not until the page boundary at
>> 0xe437f000 is reached. I do not really understand this, but it seems to
>> me that the thread
> Can you try to recover the %ecx, %edx values for the faulted frame ?
> Note that %ecx is loaded from the on-stack argument.

>From source swtch.s

        /* Save is done.  Now fire up new thread. Leave old vmspace. */
        movl    4(%esp),%edi
        movl    8(%esp),%ecx                    /* New thread */
        movl    12(%esp),%esi                   /* New lock */
        testl   %ecx,%ecx                       /* no thread? */
        jz      badsw3                          /* no, panic */
        movl    TD_PCB(%ecx),%edx

        /* switch address space */
        movl    PCB_CR3(%edx),%eax

it can be seen by inspection of the stack, that %ecx is loaded with
address of newtd (0xc8029a20) and %edx is loaded with address of newpcb
(0xf0a3ad40). So we see an exception during the execution of a correct
machine instruction. At the moment of double fault I see the same values
in the saved TSS:

(kgdb) p/x __pcpu[2]->pc_common_tss
$16 = {tss_link = 0x0, tss_esp0 = 0xe437fd30, tss_ss0 = 0x28, tss_esp1 =
0x0, tss_ss1 = 0x0, tss_esp2 = 0x0, tss_ss2 = 0x0, tss_cr3 =
0x0, tss_eip = 0xc0bacac8, tss_eflags = 0x10007, tss_eax = 0xc08f492f,
tss_ecx = 0xc8029a20, tss_edx = 0xf0a3ad40, tss_ebx = 0xd3cf, t
ss_esp = 0xe437f000, tss_ebp = 0xe437fafc, tss_esi = 0xc0e43400, tss_edi
= 0xc7ebd000, tss_es = 0x28, tss_cs = 0x20, tss_ss = 0x28, ts
s_ds = 0x28, tss_fs = 0x8, tss_gs = 0x3b, tss_ldt = 0x0, tss_ioopt =

Also we have tss_eax = 0xc08f492f = return address, so the movl for
"switch address space" was not executed.

> Do you have latest CPU microcode loaded ?  Your machine is very old,
> I believe this is P4 class processor, am I right ?

I have to correct one detail: The output

(kgdb) p/x cpu_id
$4 = 0xf29

for the CPUID was correct, but the correspondig output from dmesg was
not from the crashing server, so here is the correct one:

CPU: Intel(R) Xeon(TM) CPU 2.80GHz (2791.05-MHz 686-class CPU)
  Origin="GenuineIntel"  Id=0xf29  Family=0xf  Model=0x2  Stepping=9

kenv gives:
smbios.bios.vendor="Intel Corporation"
smbios.chassis.maker="Intel Corporation"
smbios.planar.maker="Intel     "
smbios.system.product="PLATINUM 2210R" (OEM, Intel SR2300)
smbios.system.serial="               "

>From manual "Intel Xeon Processor (Document Number 249679-056(" I found
my CPU is a Xeon 2.8B "Prestonia" (CPUID 0F29H, Core Stepping D1)
released 8.11.2002. I have the last microcode revision m02f292d, but my
BIOS version P39 was not latest. In the meantime I have upgraded to BIOS
version P43.

> Sure if pcb access faults, the system is in very broken state and
> since an attempt to handle the fault causes a new fault for pcb access,
> it recurses and dies due to the stack overflow.


Andreas Longwitz

More information about the freebsd-hackers mailing list