Re: latest current fails to boot.

From: Tomoaki AOKI <junchoon_at_dec.sakura.ne.jp>
Date: Sat, 25 Sep 2021 02:00:50 UTC
On Fri, 24 Sep 2021 01:33:33 +0300
Konstantin Belousov <kostikbel@gmail.com> wrote:

> On Thu, Sep 23, 2021 at 09:20:51PM +0200, Johan Hendriks wrote:
> > 
> > On 23/09/2021 19:52, Konstantin Belousov wrote:
> > > On Fri, Sep 24, 2021 at 12:43:01AM +0900, Tomoaki AOKI wrote:
> > > > On Wed, 22 Sep 2021 23:09:05 +0900
> > > > Tomoaki AOKI <junchoon@dec.sakura.ne.jp> wrote:
> > > > 
> > > > > On Wed, 22 Sep 2021 05:47:46 -0700
> > > > > David Wolfskill <david@catwhisker.org> wrote:
> > > > > 
> > > > > > On Wed, Sep 22, 2021 at 02:39:37PM +0200, Johan Hendriks wrote:
> > > > > > > I did a git pull this morning and it fails to boot.
> > > > > > > I hangs at Setting hostid : 0x917bf354
> > > > > > > 
> > > > > > > This is a vm running on vmware.
> > > > > > > If i boot the old kernel from yesterday it boots normally.
> > > > > > > 
> > > > > > > uname -a
> > > > > > > FreeBSD varnish-cdn-node03 14.0-CURRENT FreeBSD 14.0-CURRENT #0
> > > > > > > main-n249518-5572fda3a2f: Tue Sep 21 14:40:22 CEST 2021
> > > > > > > root@varnish-cdn-node03:/usr/obj/usr/src/amd64.amd64/sys/KRNL amd64
> > > > > > > ....
> > > > > > I had no issues with my build machine or either of two laptops, either
> > > > > > from yesterday:
> > > > > > 
> > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #358 main-n249518-5572fda3a2f3: Tue Sep 21 05:15:22 PDT 2021     root@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1400033 1400033
> > > > > > 
> > > > > > or today:
> > > > > > 
> > > > > > FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #359 main-n249556-c96da1994587: Wed Sep 22 04:24:17 PDT 2021     root@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1400033 1400033
> > > > > > 
> > > > > > [uname strings from my main laptop shown, but I keep the machines
> > > > > > in sync rather aggressively.]
> > > > > > 
> > > > > > Perhaps the issue you are encountering involves things not in my
> > > > > > environment (such as VMs or ZFS)?
> > > > > > 
> > > > > > Peace,
> > > > > > david
> > > > > > -- 
> > > > > > David H. Wolfskill                              david@catwhisker.org
> > > > > > Life is not intended to be a zero-sum game.
> > > > > > 
> > > > > > See https://www.catwhisker.org/~david/publickey.gpg for my public key.
> > > > > For me, on bare metal (non-vm) amd64 with root-on-ZFS,
> > > > > 
> > > > >    Fails to boot to multiuser at git: 8db1669959ce
> > > > >    Boot fine at git: 0b79a76f8487
> > > > > 
> > > > > Boot to singleuser is fine even with failed revision.
> > > > > 
> > > > > Failure mode:
> > > > >   Hard hangup or spinning and non-operable. Hard power-off needed.
> > > > >   Seems to happen after starting rc.conf processing and before setting
> > > > >   hostid.
> > > > > 
> > > > > -- 
> > > > > Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>
> > > > > 
> > > > Additional info and correction.
> > > >   *Hung up before setting hostuuid, not hostid.
> > > > 
> > > >   *^T doesn't respond at all, only hard power off worked.
> > > > 
> > > >   *`kldload nvidia-modeset.ko` on single user mode sanely work.
> > > > 
> > > > 
> > > > Why I could know rc.conf is started to be processed:
> > > > 
> > > >   I have lines below at the end of /etc/rc.conf and its output is always
> > > >   the first line related to /etc/rc.conf, at least for non-verbose boot.
> > > >   The next line is normally "Setting hostuuid: " line, which was not
> > > >   displayed when boot hung up.
> > > > 
> > > > 
> > > > kldstat -q -n nvidia.ko
> > > > if [ 0 -ne $? ] ; then
> > > >    echo "Loading nvidia-driver modules via rc.conf."
> > > >    if [ -e /boot/modules/nvidia-modeset.ko ] ; then
> > > >      kld_list="${kld_list} nvidia-modeset.ko"
> > > >    else
> > > >      kld_list="${kld_list} nvidia.ko"
> > > >    fi
> > > > fi
> > > If you do not load nvidia-modeset.ko at all, does the boot proceed?
> > > 
> > > When the boot hangs, can you enter into ddb?
> > > 
> > > 
> > I do not load a nvidia-modeset.ko kernel module and it will not boot. It
> > hangs with Setting hostid : as the last message. Then only a powercycle gets
> > me back. If i boot in single user mode all is fine, but as soon as i exit
> > single user mode it hangs at the same spot.
> 
> Can you enter ddb at the hang point?

It depends. In most cases, nothing other than power cycle works, but I
could get into ddb by ctrl-alt-esc only once. `bt` was like below.
Converted from photo using Google Lens, and hand-fixed mis-conversion
as much as possible, but there can be remaining mis-conversion.


===== `bt` output =====

 KDB: enter: manual escape to debugger
[ thread pid 12 tid 100041 ]
Stopped at      kdb_enter+0x37: movq $0,0x103aale (Xrip)
db> bt
Tracing pid 12 tid 100041 td 0xfffffe00e32c0000
kdb_enter() at kdb_enter+0x37/frame 0xfffffe00e2e80d40
vt kbdevent() at vt_kbdevent+0x22f/frame 0xfffffe00e2e80da0
kbdmux_intr() at kbdmux_intr+0x45/frame Oxfffffe00e2e80dc0
taskqueue_run_locked() at taskqueue_run_locked+0x197/frame
Oxfffffe00e2e80e40 taskqueue_run() at taskqueue_run+0x68/frame
0xfffffe00e2e80e60 ithread_loop() at ithread_loop+0x25f/frame
Oxfffffe00e2e80ef0 fork_exit() at fork_exit+0x8e/frame
Oxfffffe00e2e80f30 fork_trampoline() at fork_trampoline+0xe/frame
Oxfffffe00e2e80f30
--- trap 0, rip = 0x301700000000000, rsp = 0, rbp= 0xffffffff81d047d0---
??() at 0x301700000000000/frame Oxffffffff81d047d0
??() at Oxfffff80001a59b80/frame Oxfffff80001a59c00
taskqueue_swi_run() at taskqueue_swi_run

===== End `bt` output ===== 


> Do you load any other modules besides nvidia, from rc.conf?

Yes, but doesn't seem to be loaded when hung up.

ddb says...


===== `kldstat` output by ddb =====

db> kidstat
Id Refs Address            Size     Name 
 1   56 0xffffffff80200000 1f31a70  kernel
 2    1 0xffffffff82132000 2b88     acpi_call.ko
 3    1 0xffffffff82135000 290940   iwm9000fw.ko
 4    1 0xffffffff823c6000 8248     acpi_ibm.ko
 5    1 0xffffffff823cf000 6260     filemon.ko
 6    1 0xffffffff823d7000 2cf8     udf_iconv.ko
 7    4 0xffffffff823da000 9388     libiconv.ko
 8    2 0xffffffff823e4000 9f48     udf.ko
 9    1 0xffffffff823ee000 26228    if_iwm.ko
10    1 0xffffffff82415000 2d58     msdosfs_iconv.ko
11    1 0xffffffff82418000 2d40     cd9660_iconv.ko
12    1 0xffffffff828c7000 63e0     usbhid.ko
13    2 0xffffffff828ce000 6db0     hidbus.ko
14    1 0xffffffff828d5000 5b4ec8   zfs.ko
15    1 0xffffffff82e8a000 2e48     nvram.ko
16    1 0xffffffff82e8d000 3ca0     smb.ko
17    2 0xffffffff82e91000 3d50     smbus.ko
18    1 0xffffffff82e95000 4358     cpuct1.ko
db>

===== End `kldstat` output by ddb =====

All are loaded by /boot.loader.conf.
fdescfs.ko seems to missing compared to k`kldstat` from single user
shell, but it could be loaded on remounting (for rw access) fs.

Modules loaded via rc.conf on sane boot is as follows.
It should include modules automatically loaded by defd.
Address should be different, as revision is different.
Manually kldload'ing No.20 though 28 (including auto-loaded as
dependency: 21,25 and 27) did't cause hang up on single user sh of
affected revision.

===== Additional modules =====

20    1 0xffffffff83614000    174c8 smbfs.ko
21    2 0xffffffff8362c000     3090 libmchain.ko
22    1 0xffffffff83630000     5658 tpm.ko
23    1 0xffffffff83636000    11ea0 fusefs.ko
24    1 0xffffffff83648000   106310 nvidia-modeset.ko
25    1 0xffffffff83800000  1fa1a48 nvidia.ko
26    2 0xffffffff8374f000    2cce0 linux.ko
27    6 0xffffffff8377c000     9ea8 linux_common.ko
28    1 0xffffffff83786000     4350 acpi_video.ko
29    1 0xffffffff8378b000     3378 acpi_wmi.ko
30    2 0xffffffff8378f000     21d8 hconf.ko
31    1 0xffffffff83792000     21e8 hcons.ko
32    3 0xffffffff83795000     30a8 hidmap.ko
33    1 0xffffffff83799000     21e8 hms.ko
34    1 0xffffffff8379c000     32c0 hmt.ko
35    1 0xffffffff837a0000     21e8 hpen.ko
36    1 0xffffffff837a3000     3250 ichsmb.ko
37    1 0xffffffff837a7000     6c9c ig4.ko
38    1 0xffffffff837ae000     433c iicbus.ko
39    1 0xffffffff837b3000     2110 pchtherm.ko
40    1 0xffffffff837b6000    28f40 linux64.ko
41    1 0xffffffff837df000     2260 pty.ko
42    1 0xffffffff837e2000     639c linprocfs.ko
43    1 0xffffffff837e9000     3284 linsysfs.ko
44    1 0xffffffff837ed000     4c20 ng_ubt.ko
45    6 0xffffffff837f2000     aac8 netgraph.ko
46    2 0xffffffff857a2000     9238 ng_hci.ko
47    3 0xffffffff837fd000     25a8 ng_bluetooth.ko
48    1 0xffffffff857ac000     d250 ng_l2cap.ko
49    1 0xffffffff857ba000    1bef8 ng_btsocket.ko
50    1 0xffffffff857d6000     39d0 ng_socket.ko
51    1 0xffffffff857da000    27040 ipfw.ko

===== End additional modules =====


-- 
Tomoaki AOKI    <junchoon@dec.sakura.ne.jp>