8.0RC2 amd64 - kernel panic running make buildworld
Kai Gallasch
gallasch at free.de
Sat Nov 14 01:21:25 UTC 2009
Am Fri, 13 Nov 2009 15:55:42 +0200
schrieb Andriy Gapon <avg at icyb.net.ua>:
> on 13/11/2009 15:48 Kai Gallasch said the following:
> > Am Fri, 13 Nov 2009 10:08:45 +0200
> > schrieb Andriy Gapon <avg at icyb.net.ua>:
> >> Kai,
> >> I have a hunch, could you please try the following _sledgehammer_
> >> patch (only kernel build/install is needed):
> >> diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c
> >> index 44b71f3..a456609 100644
> >> --- a/sys/amd64/amd64/pmap.c
> >> +++ b/sys/amd64/amd64/pmap.c
> >> @@ -2981,6 +2981,7 @@ setpte:
> >> * Map the superpage.
> >> */
> >> pde_store(pde, PG_PS | newpde);
> >> + pmap_invalidate_all(pmap);
> >>
> >> pmap_pde_promotions++;
> >> CTR2(KTR_PMAP, "pmap_promote_pde: success for va %#lx"
> >>
> >> This will slow down an act of promotion to a superpage, but should
> >> not have any visible impact on overall performance.
> >
> > Andriy,
> >
> > I tried the patch with c
> > hw.mca.enabled="1" , rebuilt the kernel (although normally I never
> > build kernels on Friday 13th :-) and ran buildworld -j8 for five
> > times in a row. No sign of a machine check exception, no reboot.
>
> I think that this is good news.
> This is not a fix, but the fact that it helps should help us find a
> proper solution.
Hi. The patch did help for surviving a makeworld.
But now I have another machine check exception with this server. It
happened with your patch active, and vm.pmap.pg_ps_enabled="1". I
copied data from a remote server by NFS mount to the instable server.
Destination was a local ZFS filesystem.
----------------
sonnenkraft:~ # MCA: CPU 7 UNCOR PCC OVER DTLB L1 error
MCA: Address 0xff800d860000
Fatal trap 28: machine check trap while in kernel mode
cpuid = 7; apic id = 07
instruction pointer = 0x20:0xffffffff80e5f0b2
stack pointer = 0x28:0xffffff8241f8d7d0
frame pointer = 0x28:0xffffff8241f8da40
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, IOPL = 0
current process = 0 (spa_zio_1)
[thread pid 0 tid 100193 ]
Stopped at lzjb_compress+0x162: leal 0x1(%rdx),%edi
db> bt
Tracing pid 0 tid 100193 td 0xffffff000732aab0
lzjb_compress() at lzjb_compress+0x162
zio_compress_data() at zio_compress_data+0xbe
zio_write_bp_init() at zio_write_bp_init+0xc2
zio_execute() at zio_execute+0x77
zio_ready() at zio_ready+0x124
zio_execute() at zio_execute+0x77
taskq_run() at taskq_run+0x13
taskqueue_run() at taskqueue_run+0x91
taskqueue_thread_loop() at taskqueue_thread_loop+0x3f
fork_exit() at fork_exit+0x12a
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff8241f8dd30, rbp = 0 ---
----------------
After this I again tried copying to local zfs through nfs - and
again an exception.
When setting vm.pmap.pg_ps_enabled="0" in loader.conf and rebooting the
server survives the nfs copying and stays stable.
--Kai.
More information about the freebsd-current
mailing list