Re: Trying to implement BFS, page fault at vfs_domount_first, how to debug?

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Fri, 30 Dec 2022 22:06:38 UTC
On Fri, Dec 30, 2022 at 11:48 AM Rob Wing <rob.fx907@gmail.com> wrote:

> you might try `addr2line -e $path_to_kernel 0xffffffff80cf0651`
>
Just a note. I find I need the kernel called "kernel.debug" for
addr2line to work. It normally lives in the kernel build directory
under /usr/obj.

rick


>
> Aside from that, it looks like errors aren't being handled correctly after
> failing to find the BFS superblock in bfs_mountfs(). Since no error is
> returned after failing to find the superblock..I'm guessing that the NULL
> pointer `bfsmp` is being de-referenced in bfs_statfs().
>
> On Fri, Dec 30, 2022 at 10:35 AM John F Carr <jfc@mit.edu> wrote:
>
>>
>>
>> > On Dec 30, 2022, at 14:13, Hikmat Jafarli <jafarlihi@gmail.com> wrote:
>> >
>> > I'm trying to implement the BeOS filesystem (BFS) for FreeBSD.
>> > The repository is here: https://github.com/jafarlihi/freebsd-bfs
>> > (Please don't mind bad styling and all the copy-paste work,
>> > I'll polish it later, I'm just trying to get to some PoC where it works)
>> >
>> > Now when I try to mount a valid BFS partition (reported as BFS by
>> `fstyp`)
>> > it executes all the way to printf that logs "Either not a BFS volume or
>> > corrupted" and then crashes with "page fault while in kernel mode" in
>> > vfs_domount_first+0x271. Here's the log:
>> > ```
>> > Either not a BFS volume or corrupted
>> >
>> > Fatal trap 12: page fault while in kernel mode
>> > cpuid = 0; apic id = 00
>> > fault virtual address = 0x18
>> > fault code = supervisor read data, page not present
>> > instruction pointer = 0x20:0xffffffff82b2427b
>> > stack pointer        = 0x28:0xfffffe00df399ac0
>> > frame pointer        = 0x28:0xfffffe00df399ac0
>> > code segment = base 0x0, limit 0xfffff, type 0x1b
>> > = DPL 0, pres 1, long 1, def32 0, gran 1
>> > processor eflags = interrupt enabled, resume, IOPL = 0
>> > current process = 1208 (mount)
>> > trap number = 12
>> > panic: page fault
>> > cpuid = 0
>> > time = 1672414952
>> > KDB: stack backtrace:
>> > #0 0xffffffff80c694a5 at kdb_backtrace+0x65
>> > #1 0xffffffff80c1bb5f at vpanic+0x17f
>> > #2 0xffffffff80c1b9d3 at panic+0x43
>> > #3 0xffffffff810afdf5 at trap_fatal+0x385
>> > #4 0xffffffff810afe4f at trap_pfault+0x4f
>> > #5 0xffffffff810875b8 at calltrap+0x8
>> > #6 0xffffffff80cf0651 at vfs_domount_first+0x271
>> > #7 0xffffffff80cece9d at vfs_domount+0x2ad
>> > #8 0xffffffff80cec2d8 at vfs_donmount+0x8f8
>> > #9 0xffffffff80ceb9a9 at sys_nmount+0x69
>> > #10 0xffffffff810b06ec at amd64_syscall+0x10c
>> > #11 0xffffffff81087ecb at fast_syscall_common+0xf8
>> > ```
>> >
>> > Now I'm trying to understand what exactly goes wrong here
>> > and how to map 0x271 to the exact source line.
>> >
>> > I'd appreciate it if someone could tell me how to debug this.
>> >
>> > (Sorry for noob question, I already tried IRC and was directed here)
>>
>> Your BFS module tried to dereference a null pointer to structure.
>>
>> It's a null pointer dereference because of "fault virtual address =
>> 0x18".  That normally means you tried to access the fourth word of a
>> structure but the pointer to structure was null.  It could be something
>> else, but play the odds.
>>
>> It's in your module because the instruction pointer address is far beyond
>> the other kernel functions in the stack trace.  Stack traces in crash
>> reports are misleading: they tend to omit the function that triggered the
>> crash.  The address of vfs_domount_first is 0xffffffff80cf03e0
>> (0xffffffff80cf0651 - 0x271).  That's the function that called your
>> module.  The address of the faulting instruction is 0xffffffff82b2427b.
>> That's in your module.
>>
>>
>>
>>