FreeBSD 11 i386 disk deadlock (I think) (now with reproduction steps!)
David Cross
dcrosstech at gmail.com
Mon Nov 28 12:47:27 UTC 2016
This is certainly new behavior, or a new manifestation.
FreeBSD has supported encrypted swap as part of the base since _2004_ on
gbde deviced r125388, on geli since 2005 r148765 (prior to 8.0), and I've
been using it as such all this time.
Recentlyish, 2013 (10.0) the functionality was regarded as so core to
swapping it was moved into the C code of 'swapon' itself r252310
On Mon, Nov 28, 2016 at 7:00 AM, Konstantin Belousov <kostikbel at gmail.com>
wrote:
> On Sun, Nov 27, 2016 at 08:18:47PM -0800, Mark Johnston wrote:
> > On Sun, Nov 27, 2016 at 03:17:13PM -0500, David Cross wrote:
> > > So, narrowing this down, I think it has something to do with geli swap
> > > (since I can easily reproduce it with geli swap, but have yet to
> reproduce
> > > it without).. and I have a bit of a convoluted way almost anyone can
> > > reproduce it with bhyve. (Note, I haven't been able to get a
> crashdump,
> > > since apparently the VM system being locked up prevents that, but with
> > > watchdogd, I have been able to get into DDB)
> > >
> > > Anyway, my reproduction steps, I used the 11.0 Retail DVD, but I fully
> > > suspect the 11.0-RELEASE image will be fine to install an i386 image
> into
> > > bhyve; I install to vtbd disks (even though my 'real' case is to an ada
> > > device, that this can be repro-ed across such wide "hardware" really
> > > reduces the likelyhood of a device driver issue)
> > >
> > > After its installed, I start my VM with the following (dropping memory
> to
> > > the floor, well below my "real" machine, but the emulated machine is
> much
> > > faster and I suspsect this is a race condition somewhere), note the
> options
> > > to the virtio-blk device to pin it to "real" and not hit the host
> vmcache,
> > > again speed seems to be key here, and slowing things down makes it more
> > > likely to happen.
> > >
> > > bhyveload -m 64M -d /usr/bhyve/11.0.1-i386.img fbsd11-i386
> > > bhyve -u -A -c 1 -H -m 64M -C -s 0,hostbridge -s 1,lpc -s
> 2,virtio-net,tap0
> > > -s 3,virtio-blk,/usr/bhyve/11.0.1-i386.img,nocache,direct -l
> > > com1,/dev/nmdm0A fbsd11-i386
> > >
> > > At this point:
> > > Log into the VM
> > > cd /usr/src
> > > /usr/bin/make buildkernel
> > > <wait>
> > >
> > > For me this has hung 99% of the time at:
> > > objcopy --strip-debug kernel
> > >
> > > Once you've gotten here once, I have been able to just skip the rest
> of the
> > > compile, cd /usr/obj/usr/src/sys/GENERIC run that command directly and
> > > trigger the condition.
> > >
> > > What I have at this point is the following DDB ps list:
> > >
> > > db> ps
> > > pid ppid pgrp uid state wmesg wchan cmd
> > > ...
> > > 50 0 0 0 DL vmwait 0xc1c4f6d8 [g_eli[0] vtbd0p3]
> > > ...
> > > 100043 D wswbuf0 0xc1bf30d4 [pagedaemon]
> > > ...
> > >
> > > I note that the swapper and that geli are both in vmwait, and a bunch
> of
> > > other processes are in pfault, and the "crypto" drivers are in disk
> wait??
> >
> > This is a low memory deadlock: the pagedaemon is attempting to reclaim
> > memory by freeing pages from the inactive queue, and here is waiting for
> > the swap pager to finish writing out a page. However, the GELI thread is
> > blocked waiting for the pagedaemon to free up some pages.
> >
> > Some recent work that's gone into HEAD ought to address this scenario.
> > In particular, with r308474 swapping is performed by a separate thread,
> > so even if that thread blocks waiting for the GELI thread, the
> > pagedaemon is able to continue freeing clean pages or at least kill
> > memory-hogging processes. Could you try your scenario in a VM running a
> > HEAD kernel?
>
> Neither geli nor zfs vols can be used as swap, exactly because they
> allocate memory on the write path. In fact, zfs has troubles with the
> normal pageout of files as well, for this same reason.
>
> It is very easy to trigger situation when everything is dirty, and even
> worse, it is possible to have all dirty pages belong to one vnode. The
> laundry work is great, but it cannot completely solve the situation
> where free or clean page producer allocates memory.
>
More information about the freebsd-hackers
mailing list