Boot failure: panic: No heap setup
Toomas Soome
tsoome at me.com
Fri Mar 30 18:10:51 UTC 2018
> On 30 Mar 2018, at 18:03, Stefan Esser <se at freebsd.org> wrote:
>
> Am 29.03.18 um 07:15 schrieb Toomas Soome:
>>
>>
>>> On 29 Mar 2018, at 01:06, Stefan Esser <se at freebsd.org> wrote:
>>>
>>> Am 28.03.18 um 22:28 schrieb Warner Losh:
>>>>> Hmmm, the code references point into the boot loader code - I had
>>>>> expected that there is a problem in the kernel, not the boot loader.
>>>>>
>>>>>> [1]
>>>>>> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
>>>> <https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56>
>>>>>
>>>>>
>>>>> Seems that setbase has either not been called or has been called with
>>>>> base=0.
>>>>
>>>> Right, which is odd...
>>>>
>>>>>> [2]
>>>>>> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
>>>> <https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688>
>>>>>
>>>>>
>>>>> I had thought, that the zfs boot code has been initialized before the
>>>>> menu is displayed?
>>>>
>>>> Right, all of this should be done looooong before we get to the
>>>> interpreter. Can you break into the loader prompt and try the `heap`
>>>> command, see what that outputs? CC'ing imp@ because he actually knows
>>>> things.
>>>>
>>>> Totally weird. I'd add a printf to the sethead() function to display its args
>>>> and see if you get this panic before/after that printf...
>>>
>>> I'm currently using a Forth-enabled boot loader again, since this is a
>>> "production" machine (my home server, which also receives and keeps all
>>> my work email, for example).
>>>
>>> I'll build a clean world with the LUA loader and test it on one of the
>>> next days. Tests will include the "heap" loader command and I'll add the
>>> printf (though, if sbrk() has really not been called, I guess that will
>>> not go too well ...).
>>>
>>> Is it possible, that the setheap function is called a second time, just
>>> before jumping into the kernel? (In that case adding the printf might
>>> crash the loader in the first setheap call ...)
>>>
>>> Since the loader menu (and escaping from the menu) works, there must be
>>> a valid heap, at that time.
>>>
>>
>> indeed. and assuming the message really is from loader, it means, there must
>> be memory corruption - if so, you can check which variables are located
>> close to heap related ones… Also, since you have the working menu, it has to
>> be related to actual loading. Since the loading itself has been working so
>> far, it should be related to lua specific bits which are preparing towards
>> to call load functions.
>
> Ok, some more data points:
>
> 1) A printf in setheap reported plausible values during start-up of zfsboot.
> The menu appeared and wiped away the values so fast that I could not take
> a photo or write them down.
>
if you got menu and stuff, it means that at that point the heap was all OK. just after setheap() the bcache_init() is called and that too will allocate memory.
what you can do is to esc out from menu to OK prompt and check the output of heap and biosmem commands…
> 2) I have rebuilt world and kernel based on r331763. Booting resulted in the
> same panic as reported before. There was no debug output from the patched
> setheap call before the panic (which indicates that it was not called a
> second time).
>
> 3) In order to get my system to boot, I interrupted loading of zfsloader and
> forced loading of the previous version (from a world build with Forth in
> the loader). Booting succeeded with the latest kernel ...
>
> It looks as if sbrk() was called in zfsloader before setheap() has been used
> to initialize the heap parameters, if lua is enabled instead if Forth. See
> stand/i386/loader/main.c:124 for the location of the setheap call in the
> loader.
this can only happen when something is called before main…
>
> This is obviously hard to debug, though, since printf cannot be called at that
> point. A pure write(2) should be possible without heap, but since the console
> has not been initialized at the point of the setheap invocation, there is no
> working output device, AFAIK.
>
> I do not see, how any sbrk() call could occur before setheap is called. And
> there does not appear to be any other setheap function (or macro) in the
> tree, that could overload the one defined in stand/libsa/sbrk.c ...
>
> I have no idea how to proceed from here ...
>
> But now I'm sure it is a problem in zfsloader (or loader in general?).
>
> Hmmm: How is the panic message printed by sbrk() without a initialized heap?
> The definition of panic in stand/libsa/panic.c relies on a working printf!
>
> I should be able to use printf in the same way as panic does, but I did
> not succeed when I tried to use it early in zfsloader ...
>
> Regards, STefan
rgds,
toomas
More information about the freebsd-current
mailing list