kern/88823: [modules] [atapicam] atapicam - kernel trap 12 on loading and unloading

Jaakko Heinonen jh at saunalahti.fi
Wed Dec 3 07:51:53 PST 2008


Hi,

There is a CAM(4)/pass(4) bug which causes passcleanup() (in
sys/cam/scsi/scsi_pass.c) to call destroy_dev(9) with the device mutex
held. It's not allowed to call destroy_dev() with sleepable locks held.

Here's the call trace:

destroy_dev(c7b28400,0,c569f754,c7b15080,f46c6a38,...) at destroy_dev+0x10
passcleanup(c7b15080,c0b8f83b,c0bdf975,c585d058,c0e5afe0,...) at passcleanup+0x2e
camperiphfree(c7b15080,0,f46c6a58,c0477b7d,c7b15080,...) at camperiphfree+0xc2
cam_periph_invalidate(c7b15080,c59328d0,f46c6a8c,c0492b4a,c7b15080,...) at cam_periph_invalidate+0x3e
cam_periph_async(c7b15080,100,f46c6b18,0,0,...) at cam_periph_async+0x2d
passasync(c7b15080,100,f46c6b18,0,c7ae0000,...) at passasync+0xca
xpt_async_bcast(0,4,c0b6dbbf,11a5,c7b428c0,...) at xpt_async_bcast+0x32
xpt_async(100,f46c6b18,0,10,c575ccb8,...) at xpt_async+0x194
xpt_bus_deregister(0,0,c7b75b30,378,c577fc00,...) at xpt_bus_deregister+0x4e
free_softc(c577fe64,0,c7b75b30,103,c7b18100,...) at free_softc+0xe1
atapi_cam_detach(c7b18100,c7b30858,c0caa340,9a4,1,...) at atapi_cam_detach+0x7f
device_detach(c7b18100,c081bf09,c7691840,1,c7b760f8,...) at device_detach+0x8c
devclass_delete_driver(c554b6c0,c7b7610c,c0bd0dfd,2d,0,...) at devclass_delete_driver+0x91
driver_module_handler(c7692040,1,c7b760f8,ef,c7692040,...) at driver_module_handler+0xdf
module_unload(c7692040,0,253,250,f46c6c40,...) at module_unload+0x75
linker_file_unload(c7ab9d00,0,c0bcf326,400,c7b73000,...) at linker_file_unload+0xc9
kern_kldunload(c5957460,6,0,f46c6d2c,c0b11ff3,...) at kern_kldunload+0xd5
kldunloadf(c5957460,f46c6cf8,8,c0bd96d0,c0cad660,...) at kldunloadf+0x2b

Calling xpt_bus_deregister() in atapicam results this code path.
xpt_bus_deregister() must be called with the device mutex held.
Following change fixes the atapicam problem; however the patch may be
incorrect because I am not sure if passcleanup() is always called with
the lock held. I have tried the patch with atapicam(4) and umass(4)
(both use pass(4)).

%%%
Index: sys/cam/scsi/scsi_pass.c
===================================================================
--- sys/cam/scsi/scsi_pass.c	(revision 185331)
+++ sys/cam/scsi/scsi_pass.c	(working copy)
@@ -167,7 +167,9 @@ passcleanup(struct cam_periph *periph)
 
 	devstat_remove_entry(softc->device_stats);
 
+	mtx_unlock(periph->sim->mtx);
 	destroy_dev(softc->dev);
+	mtx_lock(periph->sim->mtx);
 
 	if (bootverbose) {
 		xpt_print(periph->path, "removing device entry\n");
%%%

There are also other bugs involved in unloading the atapicam module.

* If there are pending hcbs kernel will panic on unload. There's an
  obvious bug in free_softc(): it uses TAILQ_FOREACH() instead of
  TAILQ_FOREACH_SAFE(). However fixing that is not enough. There are
  additional problem(s) and I don't have a fix for them.  Here's a patch
  that changes it to refuse to detach if there are pending hcbs:

%%%
Index: sys/dev/ata/atapi-cam.c
===================================================================
--- sys/dev/ata/atapi-cam.c	(revision 185519)
+++ sys/dev/ata/atapi-cam.c	(working copy)
@@ -254,6 +254,13 @@ atapi_cam_detach(device_t dev)
     struct atapi_xpt_softc *scp = device_get_softc(dev);
 
     mtx_lock(&scp->state_lock);
+    /*
+     * XXX: Detaching when pending hcbs exist is broken.
+     */
+    if (!TAILQ_EMPTY(&scp->pending_hcbs)) {
+	mtx_unlock(&scp->state_lock);
+	return (EBUSY);
+    }
     xpt_freeze_simq(scp->sim, 1 /*count*/);
     scp->flags |= DETACHING;
     mtx_unlock(&scp->state_lock);
@@ -882,11 +889,11 @@ free_hcb(struct atapi_hcb *hcb)
 static void
 free_softc(struct atapi_xpt_softc *scp)
 {
-    struct atapi_hcb *hcb;
+    struct atapi_hcb *hcb, *tmp_hcb;
 
     if (scp != NULL) {
 	mtx_lock(&scp->state_lock);
-	TAILQ_FOREACH(hcb, &scp->pending_hcbs, chain) {
+	TAILQ_FOREACH_SAFE(hcb, &scp->pending_hcbs, chain, tmp_hcb) {
 	    free_hcb_and_ccb_done(hcb, CAM_UNREC_HBA_ERROR);
 	}
 	if (scp->path != NULL) {
%%%

* cd(4) doesn't tolerate well disappearing devices. There's code in
  cdinvalidate() to invalidate further I/O operations but calling for
  example d_close causes a crash. Thus you can't unmount a file system
  after the device has disappeared. This patch makes it to survive
  unmounting.

%%%
Index: sys/cam/scsi/scsi_cd.c
===================================================================
--- sys/cam/scsi/scsi_cd.c	(revision 185331)
+++ sys/cam/scsi/scsi_cd.c	(working copy)
@@ -382,6 +382,9 @@ cdoninvalidate(struct cam_periph *periph
 		camq_remove(&softc->changer->devq, softc->pinfo.index);
 
 	disk_gone(softc->disk);
+	softc->disk->d_drv1 = NULL;
+	softc->disk->d_close = NULL; /* allow closing the disk */
+
 	xpt_print(periph->path, "lost device\n");
 }
%%%

-- 
Jaakko


More information about the freebsd-scsi mailing list