p_vmspace in syscall

Wed Jul 4 11:17:29 UTC 2007

On Wed, 4 Jul 2007, Nicolas Cormier wrote:

> On 7/4/07, Robert Watson <rwatson at freebsd.org> wrote:
>> 
>> On Mon, 2 Jul 2007, Nicolas Cormier wrote:
>> 
>>> I am trying to map some data allocated in kernel to a user process (via a 
>>> syscall). I need the proc's vmspace, but the value of p_vmspace of the 
>>> input proc argument is NULL ... How can I get a valid vmspace ?
>> 
>> When operating in a system call, the 'td' argument to the system call 
>> function is the current thread pointer.  You can follow td->td_proc to get 
>> to the current process (and therefore, its address space).  In general, I 
>> prefer mapping user pages into kernel instead of kernel pages into user 
>> space, as it reduces the chances of leakage of kernel data to user space, 
>> and there are some useful primitives for making this easier.  For example, 
>> take a look at the sf_buf infrastructure used for things like socket 
>> zero-copy send, which manages a temporary kernel mapping for a page.
>
> Yes Roman told me in private that I'm wrong with the first argument, I 
> thought that it was a proc*...
>
> For my module I try to create a simple interface of a network allocator: 
> User code should look like this:
>
> unsigned id;
> void* data = netmalloc(host, size, &id);
> memcpy(data, "toto", sizeof("toto");
> netdetach(data);
>
> and later in another process:
> void* data = netattach(host, id);
> ...
> netfree(data);
>
> netmalloc syscall does something like that:
> - query distant host to allocate size
> - receive an id from distant host
> - malloc in kernel size
> - map the buffer to user process (*)
>
> netdetach syscall:
> - send data to distant host
>
> netattach syscall:
> - get data from host
> - malloc in kernel size
> - map the buffer to user process (*)
>
> * I already watch the function vm_pgmoveco
> (http://fxr.watson.org/fxr/source/kern/kern_subr.c?v=RELENG62#L78)
>
> I used pgmoveco as follow:
>
> vm_map_t mapa = &proc->p_vmspace->vm_map,
> size = round_page(size);
> void* data = malloc(size,  M_NETMALLOC, M_WAITOK);
> vm_offset_t addr = vm_map_min(mapa);
> vm_map_find(mapa, NULL, 0, &addr, size, TRUE, VM_PROT_ALL,
> VM_PROT_ALL, MAP_NOFAULT);
> vm_pgmoveco(mapa, (vm_offset_t)data, addr);
>
>
> With this I have a panic with vm_page_insert, I am not sure to understand 
> the reason of this panic. I can't have multiple virtual pages on the same 
> physical page ?

I think part of what you're running into here is a conceptual issue.  The 
pages allocated by malloc(9) belong to the kernel memory allocator, and are 
generally managed by the slab allocator.  While in principle you can map them 
into user space, you're going to have to set up a lot of book-keeping to 
properly free them again later, etc.  There are really two approaches you 
could be looking at:

(1) The user app allocates memory pages, perhaps using mmap() to map anonymous
     memory or a file.  You then borrow those pages to use in-kernel, mapping
     as required.

(2) Your kernel code allocates pages directly from the VM system, possibly
     anonymous swap-backed pages from the page allocator, and maps them into
     the kernel as required.

In either case, you'll need to think about address space limits, especially if 
the buffer is large -- the kernel address space on 32-bit systems is limited 
in size, since it shares the address space with a user application.  On 64-bit 
systems, this is not an issue.  You'll also need to make sure that the pages 
are both paged in and pinned in memory.  So before we talk about the details 
of the calls, we should think about how you plan to use the memory.

How much memory are we talking about -- enough to potentially run into kernel 
address space problems on 32-bit systems?  How long will the mappings persist 
-- do you map them into kernel for a brief period to fill them, and then leave 
them mapped into user space, or is this going to be a persistent shared 
mapping over a very long period of time?  Is the memory going to be pageable? 
How will it interact with things like mprotect(), msync(), etc?  What should 
happen if a the pages are released by the process using munmap() or by mapping 
over the region with mmap()?  What should happen in a child process if a 
process forks after netattach() and the parent calls netdatach()?  What 
happens if the process calls send() using a source address in the memory 
region, and zero-copy sockets are enabled, which would normally lead the page 
to be "borrowed" from the user process?

The underlying point here is that there is a model by which VM is managed -- 
pages, pagers, memory objects, mappings, address spaces, etc.  We can't just 
talk about pages being shared or mapped, we need to think about what is to be 
accomplished, and how to map that into the abstractions that already exist. 
Memory comes in different flavours, and generally speaking, you don't want to 
use pages that come from malloc(9) for sharing with userspace, so we need to 
think about what kind of memory you do need.

Robert N M Watson
Computer Laboratory
University of Cambridge