svn commit: r240822 - head/sys/geom
Pawel Jakub Dawidek
pjd at FreeBSD.org
Wed Sep 26 07:19:44 UTC 2012
On Tue, Sep 25, 2012 at 05:37:12PM -0600, Kenneth D. Merry wrote:
> On Sat, Sep 22, 2012 at 12:41:49 +0000, Pawel Jakub Dawidek wrote:
> > Log:
> > Use the topology lock to protect list of providers while withering them.
> > It is possible that provider is destroyed while we are iterating over the
> > list.
[...]
> > --- head/sys/geom/geom_disk.c Sat Sep 22 12:40:52 2012 (r240821)
> > +++ head/sys/geom/geom_disk.c Sat Sep 22 12:41:49 2012 (r240822)
> > @@ -635,10 +635,13 @@ disk_gone(struct disk *dp)
> > struct g_geom *gp;
> > struct g_provider *pp;
> >
> > + g_topology_lock();
> > gp = dp->d_geom;
> > - if (gp != NULL)
> > + if (gp != NULL) {
> > LIST_FOREACH(pp, &gp->provider, provider)
> > g_wither_provider(pp, ENXIO);
> > + }
> > + g_topology_unlock();
> > }
[...]
> This breaks devices going away in CAM.
>
> When the da(4) driver calls disk_gone(), it is necessarily holding the SIM
> lock, which is a regular MTX_DEF mutex. The GEOM topology lock is an sx
> lock, and of WITNESS blows up because of that:
[...]
> disk_gone() needs to be callable from an interrupt context. So it cannot
> acquire the topology lock.
I can reword what you said above a bit: disk_gone() modifies GEOM
topology, so it has to hold the topology lock, thus it cannot be called
from an interrupt context.
We might be able to change the topology lock to LIST_FOREACH_SAFE(), as
g_wither_provider() can only destroy current provider. This is because
CAM own the geom and nobody should be able to mess with its provider's
list, apart from CAM itself. So I'd need to know how CAM ensures that
two disk_gone() cannot be called for one geom. The answer might be that
those geoms have always only one provider, but then I cannot explain why
we have loop there. Maybe jdp@ will remember why.
I also read original commit message and I don't fully understand.
It was 7 years ago so I'll quote entire commit message:
Fix a bug that caused some /dev entries to continue to exist after
the underlying drive had been hot-unplugged from the system. Here
is a specific example. Filesystem code had opened /dev/da1s1e.
Subsequently, the drive was hot-unplugged. This (correctly) caused
all of the associated /dev/da1* entries to be deleted. When the
filesystem later realized that the drive was gone it closed the
device, reducing the write-access counts to 0 on the geom providers
for da1s1e, da1s1, and da1. This caused geom to re-taste the
providers, resulting in the devices being created again. When the
drive was hot-plugged back in, it resulted in duplicate /dev entries
for da1s1e, da1s1, and da1.
This fix adds a new disk_gone() function which is called by CAM when a
drive goes away. It orphans all of the providers associated with the
drive, setting an error condition of ENXIO in each one. In addition,
we prevent a re-taste on last close for writing if an error condition
has been set in the provider.
If disk is gone it should orphan its provider. This should cause
orphaning depended providers, including da1s1 and then da1s1e.
Orphaning sets provider's error and providers with the error set are not
retasted. This is from g_access():
else if (pp->acw != 0 && pp->acw == -dcw && pp->error == 0 &&
!(pp->geom->flags & G_GEOM_WITHER))
g_post_event(g_new_provider_event, pp, M_WAITOK,
pp, NULL);
My question is: is disk_geom() really needed? Maybe there is some small
bug somewhere, but I think GEOM should handle all this without disk_gone().
--
Pawel Jakub Dawidek http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/svn-src-head/attachments/20120926/8308f429/attachment-0001.pgp
More information about the svn-src-head
mailing list