Re: Mountroot problems on RPi3/aarch64

From: Mark Millard <marklmi_at_yahoo.com>
Date: Sun, 26 Jun 2022 02:43:58 UTC
On 2022-Jun-25, at 19:22, Mark Millard <marklmi@yahoo.com> wrote:

> On 2022-Jun-25, at 14:51, bob prohaska <fbsd@www.zefox.net> wrote:
> 
>> On Thu, Jun 23, 2022 at 06:43:24PM -0700, Mark Millard wrote:
>>> There is another checkin to main for superblock handling:
>>> 
>>> QUOTE
>>> The branch main has been updated by mckusick:
>>> 
>>> URL: https://cgit.FreeBSD.org/src/commit/?id=50dc4c7df4156863148e6a9609c03e852e2aeb35
>>> 
>> 
>> Here's the tail of the boot transcript:
>> 
>> Root mount waiting for: CAM
>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>> da0: <ASMT ASM105x 0> Fixed Direct Access SPC-4 SCSI device
>> da0: Serial Number 12345678D558
>> da0: 40.000MB/s transfers
>> da0: 953869MB (1953525168 512 byte sectors)
>> da0: quirks=0x2<NO_6_BYTE>
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
> 
> I'm not UFS/FFS implementation literate but I can show what I
> see when I look up some of the related source code. Looking up
> cgdmin leads to sys/ufs/ffs/fs.h material:
> 
> /*
> * Cylinder group macros to locate things in cylinder groups.
> * They calc filesystem addresses of cylinder group data structures.
> */
> #define cgbase(fs, c)   (((ufs2_daddr_t)(fs)->fs_fpg) * (c))
> . . .
> #define cgdmin(fs, c)   (cgstart(fs, c) + (fs)->fs_dblkno)      /* 1st data */
> . . .
> #define cgstart(fs, c)                                                  \
>       ((fs)->fs_magic == FS_UFS2_MAGIC ? cgbase(fs, c) :               \
>       (cgbase(fs, c) + (fs)->fs_old_cgoffset * ((c) & ~((fs)->fs_old_cgmask))))
> 
> From the cgdmin(fs, 0) I learn that the cgbase(fs, 0)
> involved is 0 (i.e., zero). From that, looking at
> what cgstart(fs, 0) would be leads to concluding that:
> 
> (fs)->fs_old_cgoffset * ((c) & ~((fs)->fs_old_cgmask))
> 
> is in use and ends up being 5056.

This was wrong: (fs)->fs_dblkno provided the 5056.

See later below after the "JUNK" text block.

JUNK FROM MISTAKE:
> From that I see that:
> 
> (fs)->fs_magic == FS_UFS2_MAGIC
> 
> is false.
> 
> But the messages produced via:
> 
> 	CHK(fs->fs_csaddr, !=, cgdmin(fs, 0), %jd);
> 
> implies that the code did validate the (fs)->fs_magic
> value and it passed. For reference:
> 
> # grep MAGIC /usr/main-src/sys/ufs/ffs/fs.h
> #define	FS_UFS1_MAGIC	0x011954	/* UFS1 fast filesystem magic number */
> #define	FS_UFS2_MAGIC	0x19540119	/* UFS2 fast filesystem magic number */
> #define	FS_BAD_MAGIC	0x19960408	/* UFS incomplete newfs magic number */
> #define	CG_MAGIC	0x090255
> #define	cg_chkmagic(cgp) ((cgp)->cg_magic == CG_MAGIC)
>       ((fs)->fs_magic == FS_UFS2_MAGIC ? cgbase(fs, c) :		\
> 
> From the code structure and messaging I infer that:
> 
> (fs)->fs_magic == FS_UFS1_MAGIC
> 
> or the code would have done:
> 
> 	} else {
> 		/* Bad magic number, so assume not a superblock */
> 		return (ENOENT);
> 	}
> 
> without producing the messaging.
> 
> Why it would be a UFS1 file system that was created originally,
> I do not know.
END JUNK FROM MISTAKE.

When I looked at the messaging code for CHK to see why it
reported UFS2 I see:

		printf("UFS%d superblock failed: %s (" #fmt ") %s %s ("	\
		    #fmt ")\n", fs->fs_magic == FS_UFS1_MAGIC ? 1 : 2,	\
		    #lhs, (intmax_t)lhs, #op, #rhs, (intmax_t)rhs);	\

but that implies that fs->fs_magic did not hold the
value FS_UFS1_MAGIC .

Looking back I see that I incorrectly attributed the:

(fs)->fs_dblkno

value to:

(fs)->fs_old_cgoffset * ((c) & ~((fs)->fs_old_cgmask))

End result: The file system is UFS2 with (fs)->fs_magic == FS_UFS2_MAGIC .

Ultimately: fs->fs_csaddr != (fs)->fs_dblkno seems to be the
failure condition for this UFS2 context [given the 0 in
cgdmin(fs, 0)].

I've no clue about the implications.

>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> Mounting from ufs:/dev/da0s2a failed with error 22; retrying for 3 more seconds
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> Mounting from ufs:/dev/da0s2a failed with error 22: Invalid fstype.
>> 
>> Loader variables:
>> vfs.root.mountfrom=ufs:/dev/da0s2a
>> vfs.root.mountfrom.options=rw
>> 
>> Manual root filesystem specification:
>> <fstype>:<device> [options]
>>     Mount <device> using filesystem <fstype>
>>     and with the specified (optional) option list.
>> 
>>   eg. ufs:/dev/da0s1a
>>       zfs:zroot/ROOT/default
>>       cd9660:/dev/cd0 ro
>>         (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)
>> 
>> ?               List valid disk boot devices
>> .               Yield 1 second (for background tasks)
>> <empty line>    Abort manual input
>> 
>> mountroot> 
>> 
>> Rebooting using a kernel of:
>> 
>> FreeBSD 14.0-CURRENT #74 main-n255816-e26ef41f799: Wed May 25 15:05:14 PDT 2022
>>   bob@www.zefox.org:/usr/obj/usr/src/arm64.aarch64/sys/GENERIC arm64
>> 
>> stops in single user with:
>> Root mount waiting for: CAM
>> da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
>> da0: <ASMT ASM105x 0> Fixed Direct Access SPC-4 SCSI device
>> da0: Serial Number 12345678D558
>> da0: 40.000MB/s transfers
>> da0: 953869MB (1953525168 512 byte sectors)
>> da0: quirks=0x2<NO_6_BYTE>
>> Warning: no time-of-day clock registered, system time will not be set accurately
>> Dual Console: Serial Primary, Video Secondary
>> Setting hostuuid: 30303030-3030-3030-3064-626136386435.
>> Setting hostid: 0x5cd40a6a.
>> Starting file system checks:
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> Cannot find file system superblock
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> Cannot find file system superblock
>> Warning! Some of the devices might not be available; retrying
>> Restarting file system checks:
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> Cannot find file system superblock
>> UFS2 superblock failed: fs->fs_csaddr (806456) != cgdmin(fs, 0) (5056)
>> Cannot find file system superblock
>> Unknown error 3; help!
>> ERROR: ABORTING BOOT (sending SIGTERM to parent)!
>> 2022-06-25T14:23:46.792050-07:00 - init 1 - - /bin/sh on /etc/rc terminated abnormally, going to single user mode
>> Enter full pathname of shell or RETURN for /bin/sh: 
>> root@:/ # 
>> 
>> However, simply exiting the single-user shell seems to bring up
>> normal multi-user operation. 
>> 
>> Network connectivity remains sporadic, but is much helped by an outgoing
>> ping process. 
>> 
>> Could it be significant that this filesystem was created on June 4, 2020?
> 
> June 4 is after:
> 
> 2022-05-27: Do comprehensive UFS/FFS superblock integrity checks when reading a superblock.
> 2022-06-01: Two bug fixes to UFS/FFS superblock integrity checks when reading a superblock.
> 
> but before:
> 
> 2022-06-11: Bug fix to UFS/FFS superblock integrity checks when reading a superblock.
> 
> (and before the later additions of messages about superblock failure details).
> 
> But I can not tell what the status of the system was that created
> the apparently problematical file system. It could be much older
> for all I know.
> 
> None of these messages suggest code changes to creating UFS file
> systems, just to validation.
> 
> I will note that the 2022-05-27 check-in does report:
> 
> QUOTE
> . . . Although
> it appears in only one place, the new code will apply to the kernel
> modules and (through libufs) user applications that read in superblocks.
> END QUOTE
> 
> This gets into why the older kernel behaves differently when used
> with the newer world and why there are messages from the newer
> world code even with the older kernel.
> 
> 
> Of course, none of these notes provide any solutions, just
> background information.
> 


===
Mark Millard
marklmi at yahoo.com