Grant Table Userspace Device - Status Update
Roger Pau Monné
royger at freebsd.org
Sun Aug 21 15:59:33 UTC 2016
On Fri, Aug 19, 2016 at 12:50:32AM +0530, Akshay Jaggi wrote:
> Carrying over discussion from IRC.
>
> 20:11 royger: ghost_rider: hello! I've been doing some testing with the
> > device today, and it seems there's a memory leak somewhere, after shutting
> > down all my domains I still see 1KB of memory used by the device, which
> > AFAICT is not expected (you can check with `vmstat -m |grep gntdev`)
> >
>
> Nope. That's not a leak.
>
> I ran `vmstat -m | grep gntdev` just after booting up Dom0, without any of
> the DomU's running, and I still saw 1KB of memory being used by the device.
>
> root at freebsd:~ # vmstat -m | grep gntdev
> gntdev 2 1K - 2 64
>
> That is, 2 requests have been made, out of which both are currently active,
> without any DomU's active.
>
> After this I fired up a DomU with qdisk backends, and vmstat returned:
>
> root at freebsd:~/xen_test # vmstat -m | grep gntdev
> gntdev 2129 134K - 2137 32,64,128
>
> Well in line with expectations. Now, powering off the DomU and running
> vmstat again, we get:
>
> root at freebsd:~/xen_test # vmstat -m | grep gntdev
> gntdev 2 1K - 2845 32,64,128
>
> The initial 2 requests are still active, and this has nothing to do with
> the DomU's. The first malloc() that happens in the device is in the device
> open function at [1]. That means that someone has the device open. `fstat`
> confirmed my suspicions.
>
> `fstat` with DomU active:
>
> root at freebsd:~/xen_test # fstat | grep xen/gntdev
> root qemu-system-i386 1266 29 /dev 62 crw------- xen/gntdev
> rw
> root qemu-system-i386 1266 32 /dev 62 crw------- xen/gntdev
> rw
> root qemu-system-i386 1266 34 /dev 62 crw------- xen/gntdev
> rw
> root xenconsoled 751 6 /dev 62 crw------- xen/gntdev rw
> root xenstored 746 11 /dev 62 crw------- xen/gntdev rw
>
> `fstat` with DomU powered off:
>
> root at freebsd:~ # fstat | grep xen/gntdev
> root xenconsoled 751 6 /dev 62 crw------- xen/gntdev rw
> root xenstored 746 11 /dev 62 crw------- xen/gntdev rw
>
> So yep! It's no leak. Just that xenconsoled and xenstored keep the gntdev
> device open. I guess this would be expected behaviour. Let me know if it is
> not.
>
> 20:14 royger: ghost_rider: and I've also seen a "Can't find requested
> > grant-map." after attaching 4 Qdisk to a domain and done heavy IO to to
> > them.
> > 20:16 royger: although this last one I haven't been able to reproduce
> >
>
> That's pretty strange. I have never noticed this in any of my manual or
> stress tests.
>
> At this point I would also like to mention, that the xen-gnttab code is
> kind of buggy (putting it mildly, no offence).
> Like I pointed out in the xen-devel patch thread, there is a place in code
> where "-1" is being used to specify there is no CLEAR_BYTE notify. But this
> is not being checked for inside the function, which would have caused a
> clear-byte notification on a different page, causing data corruption. The
> only reason this bug is not doing so, is because of another bug, where this
> -1 is being passed on to an unsigned int32, which would keep it out of
> bounds for most requests.
>
> I don't think this has to do anything with our device. If we lost some
> unmap request (which is where this message is generated) we would have
> surely leaked the memory for the gmap structure associated with that
> request (because, 1. ref-counting, 2. transferred to global clean list only
> on an unmap request), and that would have been visible in `vmstat`.
>
> Let me know if this repeats.
>
>
> > 20:40 royger: and I'm not sure if you tested it, but if you attach a
> > ramdisk to a VM (one created with `mdconfig -t malloc ...`) and try to run
> > newfs against it, it doesn't work, a bunch of read errors appear on both
> > the DomU console and Qemu log. Although it works with a plain file, so I
> > guess this is probably some bad interation between Qemu and FreeBSD block
> > devices...
>
>
> Mhm. Sounds like that. I'll try it out on my setup and post the results.
OK, no problem, as I said, it looks like this is some kind of bad
interaction between the grant table device and md devices, it's worth
looking into it, but it's not a blocking issue in any case.
I've already reviewed all the remaining FreeBSD code, and I plan to commit
it once 11.0 is released, so you still have a couple of weeks to look into
the md issue if you want.
Regarding the Xen code, I'm not a maintainer of the library that you have
modified, so you will have to wait for the Ack of one of the maintainers
(next week is XenSummit, so everyone is probably going to be mostly
offline).
Thanks, Roger.
More information about the soc-status
mailing list