GELI + Zpool Scrub Results in GELI Device Destruction (and Later a Corrupt Pool)

Michael B. Eichorn ike at michaeleichorn.com
Mon Apr 25 22:49:48 UTC 2016


On Mon, 2016-04-25 at 16:06 +0200, Fabian Keil wrote:
> "Michael B. Eichorn" <ike at michaeleichorn.com> wrote:
> 
> > 
> > On Mon, 2016-04-25 at 10:11 +0200, Fabian Keil wrote:
> > > 
> > > "Michael B. Eichorn" <ike at michaeleichorn.com> wrote:
> > >   
> > > > 
> > > > 
> > > > I just ran into something rather unexpected. I have a pool
> > > > consisting
> > > > of a mirrored pair of geli encrypted partitions on WD Red 3TB
> > > > disks.
> > > > 
> > > > The machine is running 10.3-RELEASE, the root zpool was setup
> > > > with
> > > > GELI
> > > > encryption from the installer, the pool that is acting up was
> > > > setup
> > > > per
> > > > the handbook.  
> > > [...]  
> > > > 
> > > > 
> > > > I had just noticed that I had failed to enable the zpool scrub
> > > > periodic
> > > > on this machine. So I began to run zpool scrub by hand. It
> > > > succeeded
> > > > for the root pool which is also geli encrypted, but when I ran
> > > > it
> > > > against my primary data pool I encountered:
> > > > 
> > > > Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada3p1.eli
> > > > destroyed.
> > > > Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada3p1.eli on
> > > > last
> > > > close.
> > > > Apr 24 23:18:23 terra kernel: GEOM_ELI: Device ada2p1.eli
> > > > destroyed.
> > > > Apr 24 23:18:23 terra kernel: GEOM_ELI: Detached ada2p1.eli on
> > > > last
> > > > close.  
> > > Did you attach the devices using geli's -d (auto-detach) flag?
> > I am using whatever the default setup comes out of the rc.d
> > scripts.
> > My rc.conf was:
> > 
> > geli_devices="ada2p1 ada3p1"
> > geli_default_flags="-k /root/encryption.key"
> > zfs_enable="YES"
> > 
> > I will try adding geli_autodetach="NO" and scubbing in about 9
> > hours.
> On FreeBSD the default (set in /etc/defaults/rc.conf) is YES.

Ah, I forgot about that file.

For the record geli_autodetach="NO" did work properly.

Interestingly, when I rebooted to test that rc.d modification when the
pool came up it resilvered for a second time and now reports no errors
at all.

> > > 
> > > If yes, this is a known issue:
> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=117158
> > >   
> > Reading that bug in detail it appears to be *specifically* for the
> > kernel panic and that zfs closing and reopening providers is
> > expected
> > behavior, and that if geli has autodetach configured that it would
> > detach.
> > 
> > It stikes me that even though this is expected behavior it should
> > not
> > be. Is there a way we could prevent the detach when zfs does closes
> > and
> > reopens providers? I cannnot think of a case where the desired
> > behavior
> > is for the pool to detach when zfs wants to reopen it.
> I suppose geli could delay the detachment a bit to give the consumer
> a chance to reopen it.

That would probably work, but my inner engineer is dissatisfied with
the idea of slowing down all detachments to solve a single case.

What about a new feature whereby zfs (or any other consumer) can
inhibit the autodetach for a brief period? I don't really know if this
is feasible, but I thought I would ask.

Anything like this is probably too major to implement without some
thought and probably consensus building. So in the meantime I will file
my bug against the handbook to make sure geli_autodetach and zfs are
mentioned.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5729 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20160425/70e5e8cb/attachment.bin>


More information about the freebsd-fs mailing list