Problem with /boot/loader [A new patch]
John Baldwin
jhb at freebsd.org
Fri Aug 8 16:49:51 UTC 2008
On Thursday 26 June 2008 11:12:33 pm Kevin Oberman wrote:
> > Date: Thu, 26 Jun 2008 23:53:44 +0200
> > From: Volker <volker at vwsoft.com>
> > Sender: owner-freebsd-stable at freebsd.org
> >
> > On 12/23/-58 20:59, Kelly Black wrote:
> > > Hello,
> > >
> > > I have a problem with loader. I recently upgraded from 6_rel to 7_rel.
> > > Now when I install world there is a problem booting.
> > >
> > > Here is what I do:
> > > cd /usr/src
> > > make buildworld
> > > make buildkernel KERNCONF=BLACK
> > > make installkernel KERNCONF=BLACK
> > >
> > > At this point I can reboot and all is good. After boot I install the new
world:
> > >
> > > cd /usr/src
> > > mergemaster -p
> > > reboot into single user mode
> > > cd /usr/src
> > > make installworld
> > > mergemaster
> > >
> > > Now when I reboot there is a problem. I get an error that the system
> > > cannot boot. Part of it looks like this:
> > > Can't work out which disk we are booting from.
> > > Guessed BIOS device 0xffffffff not found by probes, defaulting to disk0:
> > >
> > > If I boot from a live disk and replace /boot/loader with
> > > /boot/loader.old it boots up fine and everything looks good. A new
> > > world and a new kernel. I would be grateful for any help or any
> > > pointers.
> > >
> > > Sincerely,
> > > Kel
> > >
> > > PS I do not do anything special with my loader config files:
> > >
> > > $ cat loader.conf
> > >...
> >
> > Kelly,
> >
> > the /boot/loader.conf file does not come into play at that stage. Early
> > in the loader code, loader needs to figure out, which disk (BIOS device)
> > has been booted from. Until loader knows which device was booted up,
> > it's unable to access any files (even loader.conf) on your boot device.
> >
> > As I've never seen such a problem while upgrading any system, I suspect
> > your problem must be settings specific. Can you show me your kernel
> > config or are you using a plain vanilla GENERIC? Which arch are we
> > talking about?
> >
> > As I'm currently investigating another boot problem (but earlier in the
> > boot chain), I'll check boot logic in the source code and may check for
> > your issue, too, at that time, so it's just one effort. But please stay
> > patient for some days, as I'm currently too busy.
>
> We just got hit by this. The loader never loads and nothing boots. But a
> system admin discovered that the problem disappeared if the /boot.conf
> file was deleted. It just contained '-P'.
>
> Once this file was removed, the system just booted up as expected. When
> he changed it to -D or -h, the boot still locked up.
So I had a little epiphany in the shower this morning and have a possible fix.
I've suspected from the start that the hangs had to do with interrupts being
disabled/enabled at the wrong time. However, I had always been assuming that
the problem was interrupts being disabled when they should have been enabled.
Now I think it's actually the reverse. :) Some background:
There are three sorts of requests that BTX can handle that require dropping to
real mode (previously this was done with virtual 8086 mode): 1) hardware
interrupt, 2) user request (boot2/loader) to simulate a software interrupt
(e.g. int 0x15 BIOS calls), 3) user request to perform a far call to a
specified cs:ip in real mode.
For all 3 of these requests, we do preserve the %eflags register at the time
of the interrupt/user request and make it visible as-is to the real mode code
with some possible modifications. Previously the only modifications I did
was to disable interrupts (PSL_I) in case 1). When looking at this earlier,
I noticed that none of the BTX clients (boot2/gptboot/loader) had ever
explicitly initialized the eflags value that gets passed to BTX during vm86
requests, so the initial flags (including PSL_I) was garbage, and as a
result, it was sort of random as to whether or not the real mode code for
cases 2) and 3) was run with interrupts enabled or disabled.
My realization this morning is that software interrupts ('int X') in real mode
disable interrupts just like hardware interrupts do. Thus, my patch changes
BTX to disable interrupts for both cases 1) and 2) now. I think this will
fix the hangs. I'm still including the code to explicitly initialize the
eflags for user requests to a known-good value. It still has interrupts
enabled which means that case 3) should know always run with interrupts
enabled (which is the desired state), but the client can disable interrupts
in the eflags in the vm86 structure if desired.
The updated patch (same URL, new patch) is at
http://www.FreeBSD.org/~jhb/patches/btx_hang.patch
--
John Baldwin
More information about the freebsd-stable
mailing list