Re: Troubleshooting help for net/isboot-kmod

From: John Nielsen <lists_at_jnielsen.net>
Date: Wed, 06 Sep 2023 00:49:07 UTC
> On Sep 5, 2023, at 3:32 PM, John Nielsen <lists@jnielsen.net> wrote:
> 
> Since the original author has presumably moved on to better things, I’m the maintainer of the net/isboot-kmod port for enabling booting from iSCSI via an iSCSI Boot Firmware Table. I would not call myself a kernel developer but I get by in C.
> 
> Note that the port uses its own iSCSI implementation and none of the in-tree iSCSI code (e.g. under sys/dev/iscsi).
> 
> I’m trying to solve a problem where a system will boot up and negotiate an iSCSI session correctly, log in to the target, but then never continue or attach a peripheral after attempting a cam rescan. (See also https://github.com/jnielsendotnet/isboot/issues/11). I can reproduce this issue about 30% of the time on my test system as long as I have isboot's debug messages turned off. With them turned on it is 100% repeatable (which poses a challenge in directly copying kernel messages from a successful boot with debug enabled).
> 
> When the problem occurs, isboot completes its cam_attach function, (handling an XPT_PATH_INQ action along the way), enters its cam_rescan function, but never exits. See the code here: https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L2300
> 
> I can see from the debug output (below) that it again does XPT_PATH_INQ (4 times), then gets an XPT_ASYNC followed by an XPT_ABORT action. Both of those are handled by the fall-through default of setting ccb_h.status to CAM_REQ_INVALID (see https://github.com/jnielsendotnet/isboot/blob/master/src/iscsi.c#L2192 )
> 
> So what could be going on? I don’t have trace output from a successful boot to compare with, sadly. I feel like perhaps the code should do something with at least one of the XPT_ASYNC or XPT_ABORT actions but I don’t know what exactly (or why the need only arises some of the time).
> 
> -JN
> 
> ===== failed boot debug output =====
> cam attach
> isboot action 4
> isboot action 4 done
> cam attach end
> cam rescan
> isboot action 4
> isboot action 4 done
> isboot action 4
> isboot action 4 done
> isboot action 4
> isboot action 4 done
> isboot action 4
> isboot action 4 done
> isboot action 15
> isboot action 15 done
> isboot action 16
> isboot action 16 done

I just saw that these are hex (%x) and not decimal (%d) so the actions are really 0x15 XPT_GET_TRAN_SETTINGS and 0x16 XPT_SET_TRAN_SETTINGS, which do appear to be implemented. Still no closer to finding out why it doesn’t always work though.

> Trying to mount root from ufs:/dev/gpt/[REDACTED] [rw,noatime]...
> Root mount waiting for: CAM (Repeats indefinitely)
>