iSCSI disconnects dilema

Fri Jan 12 11:31:06 PST 2007

> 
> --s/l3CgOIzMHHjg/5
> Content-Type: text/plain; charset=iso-8859-2
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
> 
> On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote:
> > Hi,
> > While I think I have almost solved the problem of network disconnects,
> > It downed on me a major problem:
> > When a 'local' disk crashes, the kernel will probably hang/panic/crash.
> > if i don't try to recover, then there is no change in the above scenario.
> > if i try to recover, then the client does not know that it should
> > umount/fsck/mount.
> > While all this seems familiar, removing  a floppy/disk-on-key while it's
> > mounted, we could always say "you shouldn't have done that!", with
> > a network connection, it can happen very often - rebooting the target, a
> > network hickup, etc.
> >=20
> > So, any ideas?
> 
> In my opinion it should be done this way:
> 
> You have a queue of I/O requests. You send the to the other end and wait
> for confirmation. Until confirmation is received, you keep the requests
> queued. If the other end dies, you try to reconnect (until some timeout
> expires, the processes which send those requests will just wait), if you
> reconnect successfully, you resend not-confirmed requests, if you won't
> be able to reconnect, you just pass the errors up.
> 
> This is what I did in ggate and it seems to work.

That is basically what i'm doing - unacked request get requed.
the problem I fear (and maybe I'm paranoid :-):

assume the following scenario, the client(initiator) sends a write command,
the target acks it, then it crashes, if the write was never completed,
the initiator goes on as nothing ever happened. 

danny