Large array in KVM

Sat Dec 8 23:32:07 PST 2007

On Thu, 6 Dec 2007, Sonja Milicic wrote:

> I'm working on a kernel module that needs to maintain a large structure in 
> memory. As this structure could grow too big to be stored in memory, it 
> would be good to offload parts of it to the disk. What would be the best way 
> to do this? Could using a memory-mapped file help?

Sonja,

I think the answer depends a bit on just how large the data is.  The two most 
critical limits are consumption of physical memory and consumption of address 
space.

There are several parts of the kernel that deal with these sorts of scenarios 
for various reasons.  You might take a look at the pipe code, which maps 
pageable buffers into kernel address space, and the md(4) code, which can 
provide swap-backed virtual disk storage.  And, of couse, the file system is 
the quintissential kernel subsystem that brings data in and out of memory from 
disk :-).

On 64-bit systems, address space limits won't be much of a concern in most 
scenarios, but on 32-bit systems, the kernel address space is quite small 
(512m/1g in most configurations), and as such is both significantly smaller 
than physical memory, and also potentially quite full on busy systems.  On 
32-bit systems, it is therefore critical to manage address space use and not 
just memory use, so it may not be possible to simply map and use large amounts 
of memory without careful planning.

If you're talking about a relatively small amount of memory -- e.g., a few 
megabytes -- that you want to be pageable, the pipe code is a good reference. 
Remember that page faults may sleep for an extended period, so you would need 
to be able to avoid touching potentially paged out memory while holding 
mutexes, rwlocks, and critical sections, as well as from non-sleepable 
contexts such as interrupt threads.  Using VM, you can explicitly manage the 
paging, or you can just make sure to touch the memory only in safe contexts, 
such as from the kernel portions of user threads when either no locks are 
held, or only sleepable locks (such as lockmgr, sx(9)).

For larger amounts of memory, you will probably want to maintain your own 
cache of data loaded explicitly or mapped and faulted explicitly because of 
address space limits.  You may find that you want to interact directly with 
the buffer cache/VM system, and might find that your code ends up looking a 
bit like a file system itself.

So, in brief summary: consider both physical and address space limitations, 
and to what extent you'll need to manage the use to prevent exhaustion of 
either resouce.  You also need to be careful with locks and contexts you might 
need to fault in data.  File system code, pipe code, md code all useful 
reference material.

Robert N M Watson
Computer Laboratory
University of Cambridge