From nobody Fri Dec 31 18:41:23 2021 X-Original-To: scsi@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id ABB1119296C2 for ; Fri, 31 Dec 2021 18:41:25 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4JQYrP3mhLz3n6D; Fri, 31 Dec 2021 18:41:25 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from [IPV6:2601:648:8601:8b20:dc79:bf12:122a:616b] (unknown [IPv6:2601:648:8601:8b20:dc79:bf12:122a:616b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id DA057DECB; Fri, 31 Dec 2021 18:41:24 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Message-ID: <10687910-7500-008c-aed5-61f76ae90b3d@FreeBSD.org> Date: Fri, 31 Dec 2021 10:41:23 -0800 List-Id: SCSI subsystem List-Archive: https://lists.freebsd.org/archives/freebsd-scsi List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-scsi@freebsd.org X-BeenThere: freebsd-scsi@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Content-Language: en-US To: Alexander Motin Cc: =?UTF-8?Q?Edward_Tomasz_Napiera=c5=82a?= , scsi@FreeBSD.org, Ken Merry References: <42e175d9-1693-29e2-0b5b-3fa513aa2a2d@FreeBSD.org> <7f1a1a65-199d-858a-792c-42871d1df13e@FreeBSD.org> <68ad0ffd-79fb-973d-e5ba-92deee352d44@FreeBSD.org> <2ce80c64-6954-21d0-74eb-eeb88e289350@FreeBSD.org> From: John Baldwin Subject: Re: iSCSI target: Handling in-flight requests during ctld shutdown In-Reply-To: <2ce80c64-6954-21d0-74eb-eeb88e289350@FreeBSD.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1640976085; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=L/hsbe01vpFxBUdsc8bfoB7FMJ8LSdbRecV3CdPC/Tk=; b=QY5zhba721JaBepqIFjTPt1DIhJQiX6OxG7c5uL4jLo5yFNYCAQdL31eK582epZyjkn7wS 3hcqT9Bc5lSOSmbid+1Ejsejx2QjuPTgnwty1UQTBbgNSEl3B2uDltAmtA2E+lMTK7p03I XYxdgabuavVUjOVrpLlS1VHXTk1oW7W3IQTTFqlkx+0T8A67U4fNbAmX89PujGUo9AqGMR kUEopuWP3FLadfjDmCegmZW8CT8yDDJOR7iKxX9RQrXO8Xf8fZjU6usNnlU0OQvGzqt9tF zibXhxVX+zYxvRS1NVnDS4vIuzjw3AF/LmbV0RPDAlNmOQ4kJj/acN4q3ceaiw== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1640976085; a=rsa-sha256; cv=none; b=AdQnH4ixjnZGeXUtkvQTKsgCNLAuoy0Q2YeyO2lYXFZrNyp2A0RhvAYvRCDord5VCMTdgW tAITUsd/WBZy34nLf5Y9kgvz8hbuRN/QkqcJcm6r5sRyP6ha4SSLvYQgJ2RJKw1v5MTHYm 9IxIhrtRzidmGDj0zUSI1pOUGW5if3qfCBFX2cmQ395a+7IHXz5gt8VEh7+l6zkTPW/0sC BwsHmevrgEXy35lwECqlHOyLO+qIMulVGTRENMjsEX34p5QuKmIYLJt5UjuqeOlhs1P97+ CjEdQneJyLcQ73eXmEOm5CKaY42rcDtqxOOzgZWNQLvUXTcKq00lxTCaw5LCfA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N On 12/30/21 3:06 PM, Alexander Motin wrote: > On 30.12.2021 17:41, John Baldwin wrote: >> On 12/30/21 12:10 PM, Alexander Motin wrote: >>> The checks were in all backends, until in 2c7dc6bae9fd I've moved them >>> to ctl_datamove_done_process().  Backends don't have any special >>> knowledge now to process specific data move error codes.  It was >>> complete code duplication between backends and I've tried to organize >>> that a bit.  Unfortunately we do not have formalized data move error >>> statuses in CTL to do more.  It would be much more interesting (and >>> complicated ;)) if iSCSI (for example) could report data transfer error >>> without killing connection, but that area in general is not very >>> specified in SCSI, since usually each transport has own means of >>> recovery. >> >> Hmmm, but this commit seems to have actually broken this case.  That is, >> now the possibly uninitialized data is written to the backing store: >> >>         /* >> -        * We set status at this point for read commands, and write >> -        * commands with errors. >> +        * We set status at this point for read and compare commands. >>          */ >> -       if (io->io_hdr.flags & CTL_FLAG_ABORT) { >> -               ; >> -       } else if ((io->io_hdr.port_status != 0) && >> -           ((io->io_hdr.status & CTL_STATUS_MASK) == CTL_STATUS_NONE || >> -            (io->io_hdr.status & CTL_STATUS_MASK) == CTL_SUCCESS)) { >> -               ctl_set_internal_failure(&io->scsiio, /*sks_valid*/ 1, >> -                   /*retry_count*/ io->io_hdr.port_status); >> -       } else if (io->scsiio.kern_data_resid != 0 && >> -           (io->io_hdr.flags & CTL_FLAG_DATA_MASK) == CTL_FLAG_DATA_OUT && >> -           ((io->io_hdr.status & CTL_STATUS_MASK) == CTL_STATUS_NONE || >> -            (io->io_hdr.status & CTL_STATUS_MASK) == CTL_SUCCESS)) { >> -               ctl_set_invalid_field_ciu(&io->scsiio); >> -       } else if ((io->io_hdr.port_status == 0) && >> -           ((io->io_hdr.status & CTL_STATUS_MASK) == CTL_STATUS_NONE)) { >> +       if ((io->io_hdr.flags & CTL_FLAG_ABORT) == 0 && >> +           (io->io_hdr.status & CTL_STATUS_MASK) == CTL_STATUS_NONE) { >>                 lbalen = ARGS(beio->io); >>                 if (lbalen->flags & CTL_LLF_READ) { >>                         ctl_set_success(&io->scsiio); >> >> As ctl_datamove_done_process() will call ctl_set_internal_failure(), but >> ctl_datamove_done() will still call be_move_done which runs the code above. >> The code above doesn't check port_status anywhere, so it will still happily >> write the stale data to the backend unless CTL_FLAG_ABORT is set.  Do the >> backends need to be checking port_status == 0 again in addition to >> CTL_FLAG_ABORT then? > > No. Backends check for ((io->io_hdr.status & CTL_STATUS_MASK) == > CTL_STATUS_NONE) . Notice that you've received that HARDWARE ERROR > status, not garbage. Ah, via the ctl_set_sense(). >>>>    I do see checks for CTL_FLAG_ABORT, and the handler >>>> for >>>> the CTL_TASK_I_T_NEXUS_RESET does set CTL_FLAG_ABORT on pending >>>> requests. >>>> >>>> For the tasks in sciscsi_session_terminate_tasks(), those should already >>>> have >>>> CTL_FLAG_ABORT set anyway, but it wouldn't hurt if it were set again by >>>> cfiscsi_data_wait_abort().  For the the cfiscsi_task_management_done >>>> case I'm >>>> less certain, but I suspect there too that returning an internal error >>>> status >>>> back to the initiator is not expected and that it would be better to >>>> just set >>>> CTL_FLAG_ABORT and drop any response? >>> >>> cfiscsi_data_wait_abort() does only one specific thing -- it aborts >>> waiting for data and completes the data move operation, allowing backend >>> to continue (or at least free resources, etc).  It is only >>> cfiscsi_session_terminate_tasks() knows that it should abort data moves >>> while it aborts tasks, since it knows that the connection is dead. >>> cfiscsi_task_management_done() is another special case, since iSCSI >>> explicitly mentions that there will be no further exchanges related to >>> that task, so the data move should explicitly return.  But in general >>> case I think it would be nice to not combine those two facts of aborting >>> task and aborting data transfer, since the last may not necessary imply >>> the first, as the first does not imply the last. >>> >>> It is tempting to just set the flag, but I suspect it may cause more >>> problems (I am worrying about HA peer update).  I'd prefer the race >>> inside the transport to be handled inside the transport.  May be we >>> could just add another flag to be set under the session lock inside >>> cfiscsi_session_terminate_tasks() after it aborted all the tasks, but >>> before it start to abort transfers itself and use it inside >>> cfiscsi_datamove_out() instead of cs_terminating. >> >> I think though that even that flag wouldn't quite help as you would still >> have the problem that the internal_error response would still be sent on >> the wire if we don't abort the entire task vs aborting just the move. > > No. cfiscsi_datamove_out() called before the new flag is set would > still try to send R2T over the dying connection to be aborted by the > cfiscsi_session_terminate_tasks() few milliseconds later. > cfiscsi_data_wait_abort() would only be needed if > cfiscsi_session_terminate_tasks() has already passed through the data > waiters list and waiting for the last tasks completion. So I think what I was missing is that I had assumed in the race case that the task was not visible when the NEXUS_I_T_RESET ran, but I think from re-reading now that the task has to have been in the lun's OOA queue as we don't permit queueing more tasks due to LUN_RESERVED being cleared, so I think that means that even in the race case the task has been aborted. Perhaps then the code in cfiscsi_datamove_out can just check CTL_FLAG_ABORT instead of cs_terminating? That would function similar to your proposed new flag I think assuming it is correct that the task for any I/O being passed to cfiscsi_datamove_out concurrent with cfiscsi_session_terminate_tasks must have been "visible" on the OAA queue and thus aborted by the handler? -- John Baldwin