Re: fsck segfaults on rpi3 running 13-stable (and on 14-CURRENT analyzing the same file system that resulted from the 13-STABLE crash)
- Reply: bob prohaska : "Re: fsck segfaults on rpi3 running 13-stable (and on 14-CURRENT analyzing the same file system that resulted from the 13-STABLE crash)"
- In reply to: bob prohaska : "Re: fsck segfaults on rpi3 running 13-stable (and on 14-CURRENT analyzing the same file system that resulted from the 13-STABLE crash)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 18 Feb 2023 00:23:59 UTC
On Feb 17, 2023, at 15:25, bob prohaska <fbsd@www.zefox.net> wrote: > On Wed, Feb 15, 2023 at 11:39:13AM -0800, Mark Millard wrote: >> On Feb 15, 2023, at 11:08, bob prohaska <fbsd@www.zefox.net> wrote: >> >>> On Wed, Feb 15, 2023 at 09:40:51AM -0800, Mark Millard wrote: >>>> >>>> Looking in my /usr/main-src/sbin/fsck_ffs/inode.c >>>> I see that the original file has a leading tab >>>> instead of spaces. >>>> >>>> The following mostly ignores the 1st column that >>>> should have a space, -, or + in the diff output for >>>> the file-content lines. It is mostly about the text >>>> after the first column. >>>> >>>> So, if you have spaces instead after the first column >>>> for the lines that start with a space, those lines >>>> will not match, leading to a rejection for the >>>> context matching done by patch. >>> >>> Replacing spaces with tabs allowed patch to find the >>> location, but it still fails with >>> patch: **** malformed patch at line 5: printf("SIZE=%ju ", (uintmax_t)DIP(dp, di_size)); >> >> My guess is that when you made the adjustment to have >> the tabs, the leading space was also removed on this >> line. The first column is not part of the original >> text but is instead a directive to the tool. The >> missing space would be that directive and it needs to >> be there. So: >> >> <space><tab>printf("SIZE=%ju ", (uintmax_t)DIP(dp, di_size)); >> >> The space indicates to use the reset of the line just >> for context identification. >> >> Of course, since I've no access the file to check my >> hypothesis, it is just a guess. >> >>> Editing by hand looks like a good way to drive myself crazy 8-) > > Turns out to be true, but not in the manner expected. Editing in > the changes by hand seems to have worked, in that fsck_ffs recompiled > and no longer segfaults when examining the -stable filesystem. > > However, repeated runs of fsck continue to emit errors starting with > root@www:/usr/src # fsck -y /dev/da1s2d > ** /dev/da1s2d > ** Last Mounted on /usr > ** Phase 1 - Check Blocks and Sizes > 7912408300994173476 BAD I=69393345 > 4313599915630302063 BAD I=69393345 > -4473632163892877928 BAD I=69393345 > 8068741989830080453 BAD I=69393345 > .... > This continues through a succession of I values, > ending with > > ..... > > 3857159125896022134 BAD I=74682090 > -4354179704011695453 BAD I=74682090 > 7611175298055105740 BAD I=74682090 > 3985638883347136889 BAD I=74682090 > -2495754894521232470 BAD I=74682090 > 7739654885841380823 BAD I=74682090 > ** Phase 2 - Check Pathnames > ** Phase 3 - Check Connectivity > ** Phase 4 - Check Reference Counts > LINK COUNT FILE I=69316035 OWNER=root MODE=100644 > SIZE=36680 MTIME=Feb 11 12:06 2023 COUNT 2 SHOULD BE 1 > ADJUST? yes > > BAD/DUP FILE I=69393345 OWNER=root MODE=100644 > SIZE=720896 MTIME=Jul 22 23:00 2022 > > CLEAR? yes > > fsck_ffs: cglookup: out of range cylinder group 175966913 > root@www:/usr/src Looks like that is one of the messages for problems fsck_ffs does not attempt to deal with (probably for good reasons in each case/context). The below does not show the specific conditions, just the calls with the message texts used for the various exits of the "errx(EEXIT" form: # grep -r "errx(EEXIT," /usr/main-src/sbin/fsck_ffs/ | more /usr/main-src/sbin/fsck_ffs/pass5.c: errx(EEXIT, "BAD STATE %d FOR INODE I=%ju", /usr/main-src/sbin/fsck_ffs/inode.c: errx(EEXIT, "bad inode number %ju to ginode", /usr/main-src/sbin/fsck_ffs/inode.c: errx(EEXIT, "bad inode number %ju to nextinode", /usr/main-src/sbin/fsck_ffs/inode.c: errx(EEXIT, "cannot allocate space for inode buffer"); /usr/main-src/sbin/fsck_ffs/inode.c: errx(EEXIT, "cannot increase directory list"); /usr/main-src/sbin/fsck_ffs/inode.c: errx(EEXIT, "cannot increase directory list"); /usr/main-src/sbin/fsck_ffs/inode.c: errx(EEXIT, "BAD STATE %d TO BLKERR", inoinfo(ino)->ino_state); /usr/main-src/sbin/fsck_ffs/dir.c: errx(EEXIT, "wrong type to dirscan %d", idesc->id_type); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "inoinfo: inumber %ju out of range", /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "Initial malloc(%d) failed", sblock.fs_bsize); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "%s", failreason); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "cglookup: out of range cylinder group %d", cg); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "Cannot allocate cylinder group buffers"); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT,"Ran out of memory during journal recovery"); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "Excessive buffer size %ld > %d\n", size, /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "panic: lost %d buffers", numbufs - cnt); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "ABORTING DUE TO READ ERRORS"); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "cannot allocate buffer pool"); /usr/main-src/sbin/fsck_ffs/fsutil.c: errx(EEXIT, "UNKNOWN INODESC FIX MODE %d", idesc->id_fix); /usr/main-src/sbin/fsck_ffs/pass4.c: errx(EEXIT, "BAD STATE %d FOR INODE I=%ju", /usr/main-src/sbin/fsck_ffs/pass1.c: errx(EEXIT, "cannot alloc %u bytes for inoinfo", /usr/main-src/sbin/fsck_ffs/pass1.c: errx(EEXIT, "cannot alloc %u bytes for inoinfo", /usr/main-src/sbin/fsck_ffs/setup.c: errx(EEXIT, "cannot allocate space for snapshot " /usr/main-src/sbin/fsck_ffs/setup.c: errx(EEXIT, "cannot allocate space for superblock"); /usr/main-src/sbin/fsck_ffs/setup.c: errx(EEXIT, "calcsb: cannot allocate recovery buffer"); /usr/main-src/sbin/fsck_ffs/main.c: errx(EEXIT, "cannot do level %d conversion", /usr/main-src/sbin/fsck_ffs/main.c: errx(EEXIT, "bad mode to -m: %o", lfmode); /usr/main-src/sbin/fsck_ffs/main.c: errx(EEXIT, "-%c flag requires a %s", flag, req); /usr/main-src/sbin/fsck_ffs/pass2.c: errx(EEXIT, "CANNOT ALLOCATE ROOT INODE"); /usr/main-src/sbin/fsck_ffs/pass2.c: errx(EEXIT, "CANNOT ALLOCATE ROOT INODE"); /usr/main-src/sbin/fsck_ffs/pass2.c: errx(EEXIT, "CANNOT ALLOCATE ROOT INODE"); /usr/main-src/sbin/fsck_ffs/pass2.c: errx(EEXIT, "BAD STATE %d FOR ROOT INODE", /usr/main-src/sbin/fsck_ffs/pass2.c: errx(EEXIT, "BAD STATE %d FOR INODE I=%ju", > It's unclear whether the patch is preventing fsck > from repairing the filesystem, or the problems are > inherently beyond fixing. Looks like it is in the do-not-fix category. If no prior adjustments were made in the run, then things have stayed as they were. (These messages could be clearer about the status that they imply and what one should do in responce.) > Repeated fsck runs seem > to just reproduce the same output. So, appearently, no prior adjustments either for the re-runs. > There's no prompt > to re-run fsck. I expect that is true of all the above "errx(EEXIT" lines: the report is of a "did not fix" issue that blocks progress. > Thanks to both Marks for the patch and essential > help it making it stick. If anything else is > worth trying I'm game, there's little to lose. I've no clue if there is more to try. But, even if there is, there may be other issues/constraints that lead to not bothering to try? Beyond that, things with floating-point use in multi-threading contexts looks to be significantly broken in main [so: 14] for now. (This was involved in your FreeBSD crash based on the the backtrace showed.) If you try to set up another armv7 context, I suggest, for now, staying before: commit 6926e2699ae55080f860488895a2a9aa6e6d9b4d Author: Kornel Dulęba <kd@FreeBSD.org> AuthorDate: 2023-02-04 12:59:30 +0000 Commit: Kornel Dulęba <kd@FreeBSD.org> CommitDate: 2023-02-04 19:21:43 +0000 arm: Add support for using VFP in kernel This would be until a list of issues have been addressed. I've reported how to produce 3 distinct failures, 2 of which hit KASSERT panics, and the other one is for ending up with floating-point values from the wrong thread (but same process). More may be identified and fixed before things generally work again for main for armv7 FreeBSD. === Mark Millard marklmi at yahoo.com