Re: iSCSI target: Handling in-flight requests during ctld shutdown
Date: Thu, 30 Dec 2021 19:29:20 UTC
On 12/29/21 1:57 PM, Alexander Motin wrote: > On 29.12.2021 16:39, John Baldwin wrote: >> One of the tests Chelsio QA has been running against our iSCSI stack >> with cxgbei offload enabled is to run a bunch of iozone's on an >> initiator while running a script on the target that keeps stopping >> ctld (for a minute or so), then starting it again and letting it run >> for about 5 minutes until stopping it again. >> >> One of the errors found last night is that the target reported the >> following error to the initiator: >> >> (da7:iscsi10:0:0:0): CAM status: SCSI Status Error >> (da7:iscsi10:0:0:0): SCSI status: Check Condition >> (da7:iscsi10:0:0:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal >> target failure) >> (da7:iscsi10:0:0:0): Actual Retry Count: 44 >> (da7:iscsi10:0:0:0): Error 5, Unretryable error >> g_vfs_done():da7[WRITE(offset=9797632, length=32768)]error = 6 >> UFS: forcibly unmounting /dev/da7 from /ISCSI8 > > >> So my question I think is what is the expected behavior? Is the >> internal error >> really expected to make it on the wire to be sent to the other side? Since >> the connection is shutting down should we just discard the reply altogether >> rather than reporting an internal error? If we discarded the reply then >> the >> initiator in this particular test would have retried the original >> request once >> ctld was restarted and continued running without an error. > > The HARDWARE ERROR is obviously not expected by the initiator. It > should better not be leaked after we decided to kill the connection. > Initiator may retry it and still work happily after reconnect, but > cleaner would be to not rely on that. cfiscsi_session_terminate_tasks() > aborts all running commands by CTL_TASK_I_T_NEXUS_RESET, that make them > not return statuses to initiator, but I suppose this is the other side > of the race now. Hmm, I wonder if we should be setting CTL_FLAG_ABORT instead of setting the port_status when aborting an I/O? The comment in ctl_frontend_iscsi.c claims the backends check the port_status, but I don't see any checks for port_status at all in backends. I do see checks for CTL_FLAG_ABORT, and the handler for the CTL_TASK_I_T_NEXUS_RESET does set CTL_FLAG_ABORT on pending requests. For the tasks in sciscsi_session_terminate_tasks(), those should already have CTL_FLAG_ABORT set anyway, but it wouldn't hurt if it were set again by cfiscsi_data_wait_abort(). For the the cfiscsi_task_management_done case I'm less certain, but I suspect there too that returning an internal error status back to the initiator is not expected and that it would be better to just set CTL_FLAG_ABORT and drop any response? -- John Baldwin