CAM problem
Alexander Motin
mav at FreeBSD.org
Tue Oct 20 08:36:24 UTC 2009
Andrew Thompson wrote:
> I have a cam problem that is noticeable with usb devices. It relates to
> the ordering of xpt_release_device() and the CAM_DEV_UNCONFIGURED flag
> when yanking a device that has stalled. This then causes a problem with
> the usb explore thread which will end up waiting on simfree forever,
> blocking any further usb attach/detach on the controller.
>
> Hopefully my printfs can show the problem. I have replaced the pointers
> returned from xpt_alloc_device() with pretty names, <dev3> is the one in
> question.
>
> <...unplug...>
>
> ugen1.3: <KINGSTON> at usbus1 (disconnected)
> umass0: at uhub2, port 1, addr 3 (disconnected)
> umass_detach:
> usb_cam_action, device GONE
> usb_cam_action, device GONE
> usb_cam_action, device GONE
> xpt_find_bus: ref=6 -> 7
> usb_cam_action, device GONE
> usb_cam_action, device GONE
As I can see, you are returning CAM_TID_INVALID error here. There is no
special error handling for this error, comparing to CAM_SEL_TIMEOUT. If
you return CAM_SEL_TIMEOUT there, device will be killed immediately and
probably workaround this specific problem.
> xpt_release_device dev3 failed, ref=3 unconf=0
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=7 -> 6
> (da0:umass-sim0:0:0:0): got CAM status 0x39
> (da0:umass-sim0:0:0:0): fatal error, failed to attach to device
> (da0:umass-sim0:0:0:0): lost device
> (da0:umass-sim0:0:0:0): removing device entry
>
> ^^^ USB disk had stalled on attach
This thing drops reference as periph driver detached itself, but device
is still treated as valid by XPT.
> xpt_release_device dev3 failed, ref=1 unconf=0
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=6 -> 5
> xpt_release_device dev3 failed, ref=0 unconf=0
>
> ^^^ last reference to dev3 dropped
>From deallocation point of view, configured status handled the same as
one more reference...
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=5 -> 4
> xpt_release_device dev2 OK
> xpt_release_target: xpt_release_bus
> xpt_release_bus: ref=4 -> 3
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=3 -> 2
> umass_cam_detach_sim: calling xpt_bus_deregister
> xpt_find_bus: ref=2 -> 3
> xpt_alloc_target: ref=3 -> 4
> xpt_alloc_device: device = dev4
> scsi_dev_async: set dev dev3 unconfigured
>
> ^^^ dev3 gets the CAM_DEV_UNCONFIGURED flag cleared here
... but removing configured status does not call deallocation, as
unreferencing does.
> xpt_bus_deregister: xpt_release_bus
> xpt_release_bus: ref=4 -> 3
> xpt_release_device dev4 OK
> xpt_release_target: xpt_release_bus
> xpt_release_bus: ref=3 -> 2
> xpt_release_path: xpt_release_bus
> xpt_release_bus: ref=2 -> 1
> umass_cam_detach_sim:
> umass-sim0: waiting... ref = 1
>
> ^^^ wait on "simfree" forever.
I think correct solution will be to additionally increment reference
counter before clearing CAM_DEV_UNCONFIGURED and decrement it back after
setting CAM_DEV_UNCONFIGURED back. Check for CAM_DEV_UNCONFIGURED inside
xpt_release_device() then could be removed or turned into assertion.
--
Alexander Motin
More information about the freebsd-current
mailing list