Re: fsck segfaults on rpi3 running 13-stable

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 12 Feb 2023 05:21:29 UTC
On Feb 11, 2023, at 20:35, bob prohaska <fbsd@www.zefox.net> wrote:

> On Sat, Feb 11, 2023 at 06:57:41PM -0800, Mark Millard wrote:
>> On Feb 11, 2023, at 14:40, bob prohaska <fbsd@www.zefox.net> wrote:
>> 
>>> While running buildworld on a Pi3 running 13-stable  the machine
>>> panic'd. On restart using the previous kernel fsck failed with a 
>>> segfault, which repeated when the disk was moved to a -current Pi3. 
>>> 
>>> In single user mode on -current the segfault message is
>>> 
>>> ....
>>> 7912408300994173476 BAD I=74682090
>>> 4313599915630302063 BAD I=74682090
>>> -4473632163892877928 BAD I=74682090
>>> 8068741989830080453 BAD I=74682090
>>> 3857159125896022134 BAD I=74682090
>>> -4354179704011695453 BAD I=74682090
>>> 7611175298055105740 BAD I=74682090
>>> 3985638883347136889 BAD I=74682090
>>> -2495754894521232470 BAD I=74682090
>>> 7739654885841380823 BAD I=74682090
>>> INODE CHECK-HASH FAILED I=74999808  OWNER=1842251117 MODE=15044
>>> fsck: /dev/da1s2d: Segmentation fault
>>> 
>>> I gather this like unlikely to be recoverable, but it would be 
>>> nice to understand what went wrong if possible.
>> 
>> Did it produce a *.core file?
>> 
> 
> The 13-current host, looking at the 13-stable disk, reports
> root@www:~ # savecore -C -v /dev/da1s2b
> checking for kernel dump on device /dev/da1s2b
> mediasize = 2147483648 bytes
> sectorsize = 512 bytes
> magic mismatch on last dump header on /dev/da1s2b

14-CURRENT?

For system crash dumps, they may need to be handled by the same
type of system that produced them. (Thus the "magic mismatch"?)

However, I was not actually after that in my question. I
was after fsck crash file(s), not system crash information.
(Also useful, just for different purposes.)

> No dump exists
> 
> Seemingly no file was made, or it got erased amid my fumbling.

I was not after the original system crash information
in my question. I was after what might have been recorded
when fsck was run and failed.

So, since you ran a fsck under 14(?)-CURRENT, the file
system for 14(?)-CURRENT might have a *.core file from
the fsck run. (Unsure for fsck.core vs. fsck_ffs.core
as the file name.) This is on a non-corrupted file
system. I was avoiding trying to look at files from
a corrupted file system.

If the fsck failure can be fixed, you might be able
to use fsck after it was fixed to repair the file
system.

The more things done with/to the corrupted file system,
the worse its status for analysis or repair after those
changes.

> The corruption of the ailing disk is almost certainly in some
> part of /usr/obj or /usr/src. Is there any subterfuge that 
> might allow me to simply delete, say, /usr/obj and then let
> the buildworld process re-populate it? Something along the
> lines of 
> mount -o force /dev/da1s2d /mnt
> and then run 
> rm -rf /mnt/obj  
> 
> then unmount and try fsck again. At this stage there's not much
> to be lost....

I'd go for having fsck fixed if you can provide enough
context for someone to identify the failure and make a
fix. (So: an update to 14(?)-CURRENT.)

This presumes that you have the time to wait vs. having
to quickly just start over after quickly taking a
riskier route if it fails.


===
Mark Millard
marklmi at yahoo.com