FreeBSD 11 i386 disk deadlock (I think) (now with reproduction steps!)

Mon Nov 28 23:19:30 UTC 2016

On Mon, Nov 28, 2016 at 10:50 AM, David Cross <dcrosstech at gmail.com> wrote:
> I wouldn't call this a 'workaround', but the right answer.  Something in
> the disk io path shouldn't be allocating memory out of the pool that can
> cause paging (since any of that could be IN the path for paging).  It was
> what I assumed Fabian's proposed patch was.
>
> From looking at the process list on my machine, it seems that geli
> allocates a process per core per provider, is there a reason to not have
> each of these on startup allocate themselves a single buffer of
> sector-size, and just put all operations through that?  You're not
> (realistically) going to get more concurrency than that.  I guess another
> approach would be to pre-allocate a ring buffer of the desired operational
> depth.. but that seems overkill.

I have some code that helps fix this in the GEOM layer. For the
swapper, it will allocate out of a pool of memory that's set aside for
that. While it is still a pool, the only time things are allocated out
of it is when the swapper is swapping stuff out. So if you hit a
resource shortage and have to wait, you know the wait will be bounded
unless the disk I/O never completes. This is already weakly done with
UMA, but the guarantees aren't strong enough that we'll always make
progress.

There are other places in the stack that allocate shared resources,
but this one bit us at Netflix. I've not yet cleaned up the patches
for upstreaming...  I want to let the recent vm changes settle before
tackling this again as well...

Warner

> On Mon, Nov 28, 2016 at 11:22 AM, Slawa Olhovchenkov <slw at zxy.spb.ru> wrote:
>
>> On Mon, Nov 28, 2016 at 06:03:11PM +0200, Konstantin Belousov wrote:
>>
>> > On Mon, Nov 28, 2016 at 02:43:30PM +0100, Fabian Keil wrote:
>> > > David Cross <dcrosstech at gmail.com> wrote:
>> > >
>> > > > This is certainly new behavior, or a new manifestation.
>> > >
>> > > Recently a couple of uma consumers were changed to share uma zones
>> > > instead of using a dedicated zone. As a result geli competes with
>> > > more uma consumers and is more likely to deadlock. The bug isn't
>> > > new, it's just triggered more often now.
>> > The problem happens on layer much lower than UMA, it is whole reusable
>> > page pool which is depleted and cannot be re-filled without allocating
>> > more memory.  If you think about it, the deadlock is obviously trivial:
>> > pagedaemon is the main source of the free pages, but if producing free
>> > page requires allocating one, low memory condition is equal to deadlock.
>> >
>> > It was always there, in the sense that for all versions of freebsd, if
>> > file/disk write path requires memory allocation, there is the trouble.
>> >
>> > For geom, some special unique measures were taken so that bio allocations
>> > do not cause the issue in typical situations.
>>
>> Typical workaround for this is pre-allocate some memory for this
>> operation.
>>
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"