Panic when removing a SCSI device entry
Kostik Belousov
kostikbel at gmail.com
Sun May 8 10:02:48 UTC 2011
On Sun, May 08, 2011 at 10:53:14AM +0200, Joerg Wunsch wrote:
> I've got a setup where a tape library is attached with a
> computer-controllable power switch, so it is only turned on during the
> time when backups (or restores) are done. This is mainly to reduce
> the noise level, but also to reduce the overall power consumption
> energy while that library is not needed.
>
> Every now and then, the kernel panics with a page fault during the
> (unattented, it happens at night times) power cycling and surrounding
> actions. The current process when the page fault happens is always
> mt(1), which is used inside the powerup/down script to ensure the
> drive is being properly rewound. The page fault happens in
> destroy_devl(), at this location:
>
> /* If we are a child, remove us from the parents list */
> if (dev->si_flags & SI_CHILD) {
> here --->>> LIST_REMOVE(dev, si_siblings);
> dev->si_flags &= ~SI_CHILD;
> }
>
> The preprocessed code of that looks like:
>
> if (dev->si_flags & 0x0010) {
> if ((((dev))->si_siblings.le_next) != ((void *)0))
> (((dev))->si_siblings.le_next)->si_siblings.le_prev =
> (dev)->si_siblings.le_prev;
> *(dev)->si_siblings.le_prev = (((dev))->si_siblings.le_next);
> dev->si_flags &= ~0x0010;
> }
>
> and it's the indirection of *(dev)->si_siblings.le_prev that hits a
> NULL pointer. Obviously, LIST_REMOVE doesn't anticipate that
Is it NULL pointer dereference ? See below.
> dev->si_siblings.le_prev might be a NULL pointer, so this is a usage
> error, somehow. Could it be that destroy_devl() is called twice for
> the same device?
>
> This used to happen on an earlier system (some version of 7.x-stable),
> and I eventually managed it to tweak the powerup/down scripts of the
> library so to avoid the critical sequence of actions triggering this
> situation. Now that I finally upgraded the machine to 8.2-STABLE,
> it is triggered very frequently again though.
>
> Any ideas how to fix it, or at least apply a workaround, other than
> turning
>
> *(elm)->field.le_prev = LIST_NEXT((elm), field); \
>
> in the LIST_REMOVE macro into
>
> if ((elm)->field.le_prev != NULL) \
> *(elm)->field.le_prev = LIST_NEXT((elm), field); \
>
> which affects the entire system, not just the SCSI subsystem part?
Please provide the full printout from the panic. Also, it would
be useful to get the dump and do "p *dev" from the frame of
destroy_devl(). I might need further information after the requested
data is provided.
Thing you may try meantime is the following patch.
diff --git a/sys/kern/kern_conf.c b/sys/kern/kern_conf.c
index b2be5cc..59b876c 100644
--- a/sys/kern/kern_conf.c
+++ b/sys/kern/kern_conf.c
@@ -981,6 +981,8 @@ destroy_devl(struct cdev *dev)
/* Remove name marking */
dev->si_flags &= ~SI_NAMED;
+ dev->si_refcount++; /* Avoid race with dev_rel() */
+
/* If we are a child, remove us from the parents list */
if (dev->si_flags & SI_CHILD) {
LIST_REMOVE(dev, si_siblings);
@@ -997,7 +999,6 @@ destroy_devl(struct cdev *dev)
dev->si_flags &= ~SI_CLONELIST;
}
- dev->si_refcount++; /* Avoid race with dev_rel() */
csw = dev->si_devsw;
dev->si_devsw = NULL; /* already NULL for SI_ALIAS */
while (csw != NULL && csw->d_purge != NULL && dev->si_threadcount) {
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-scsi/attachments/20110508/70072f3e/attachment.pgp
More information about the freebsd-scsi
mailing list