Re: fsck segfaults on rpi3 running 13-stable (and on 14-CURRENT analyzing the same file system that resulted from the 13-STABLE crash)

From: bob prohaska <fbsd_at_www.zefox.net>
Date: Fri, 17 Feb 2023 23:25:37 UTC
On Wed, Feb 15, 2023 at 11:39:13AM -0800, Mark Millard wrote:
> On Feb 15, 2023, at 11:08, bob prohaska <fbsd@www.zefox.net> wrote:
> 
> > On Wed, Feb 15, 2023 at 09:40:51AM -0800, Mark Millard wrote:
> >> 
> >> Looking in my /usr/main-src/sbin/fsck_ffs/inode.c
> >> I see that the original file has a leading tab
> >> instead of spaces.
> >> 
> >> The following mostly ignores the 1st column that
> >> should have a space, -, or + in the diff output for
> >> the file-content lines. It is mostly about the text
> >> after the first column.
> >> 
> >> So, if you have spaces instead after the first column
> >> for the lines that start with a space, those lines
> >> will not match, leading to a rejection for the
> >> context matching done by patch.
> > 
> > Replacing spaces with tabs allowed patch to find the 
> > location, but it still fails with 
> > patch: **** malformed patch at line 5: printf("SIZE=%ju ", (uintmax_t)DIP(dp, di_size));
> 
> My guess is that when you made the adjustment to have
> the tabs, the leading space was also removed on this
> line. The first column is not part of the original
> text but is instead a directive to the tool. The
> missing space would be that directive and it needs to
> be there. So:
> 
> <space><tab>printf("SIZE=%ju ", (uintmax_t)DIP(dp, di_size));
> 
> The space indicates to use the reset of the line just
> for context identification.
> 
> Of course, since I've no access the file to check my
> hypothesis, it is just a guess.
> 
> > Editing by hand looks like a good way to drive myself crazy 8-)

Turns out to be true, but not in the manner expected. Editing in 
the changes by hand seems to have worked, in that fsck_ffs recompiled
and no longer segfaults when examining the -stable filesystem.

However, repeated runs of fsck continue to emit errors starting with
root@www:/usr/src # fsck -y /dev/da1s2d
** /dev/da1s2d
** Last Mounted on /usr
** Phase 1 - Check Blocks and Sizes
7912408300994173476 BAD I=69393345
4313599915630302063 BAD I=69393345
-4473632163892877928 BAD I=69393345
8068741989830080453 BAD I=69393345
....
This continues through a succession of I values, 
ending with  

.....

3857159125896022134 BAD I=74682090
-4354179704011695453 BAD I=74682090
7611175298055105740 BAD I=74682090
3985638883347136889 BAD I=74682090
-2495754894521232470 BAD I=74682090
7739654885841380823 BAD I=74682090
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
LINK COUNT FILE I=69316035  OWNER=root MODE=100644
SIZE=36680 MTIME=Feb 11 12:06 2023  COUNT 2 SHOULD BE 1
ADJUST? yes

BAD/DUP FILE I=69393345  OWNER=root MODE=100644
SIZE=720896 MTIME=Jul 22 23:00 2022 

CLEAR? yes

fsck_ffs: cglookup: out of range cylinder group 175966913
root@www:/usr/src

It's unclear whether the patch is preventing fsck
from repairing the filesystem, or the problems are
inherently beyond fixing. Repeated fsck runs seem
to just reproduce the same output. There's no prompt 
to re-run fsck.  

Thanks to both  Marks for the patch and essential
help it making it stick. If  anything else is
worth trying I'm game, there's little to lose.

bob prohaska