32GB limit per swap device?
Matthew Dillon
dillon at apollo.backplane.com
Tue Aug 23 07:21:01 UTC 2011
Two additional pieces of information.
The original limitation was more related to DEV_BSIZE calculations for
the buf/bio, which is now 64-bits and thus not applicable, though you
probably need some preemptive casts to ensure the multiplication is
done in 64-bits. There was also another intermediate calculation
overflow in the swap radix-tree code which had to be fixed to be able
to use the full range... I think w/simple casts. I haven't looked it
up but there should be a few commits in the DFly codebase that can
be referenced.
Second item: The main physical memory use is not the radix tree bitmap
code for the swap code, but instead the auxillary data structure used
to store the swapblk information which is associated with the vm_object
structure. This structure contains a short array of swap block
assignments (as a memory optimization to reduce header overhead) and
it is these fields which you really want to keep 32-bits (unless you
want the ~1MB per ~1GB of swap to become ~2MB per ~1GB of swap in
physical memory overhead). The block number is in page-sized chunks
so the practical limit is still ~4TB, with a further caveat below.
The further caveat is that the actual limitation for the radix tree
is 0x40000000 blocks, which is 1/4 the full range or ~1TB, so the
actual limitation for the (fixed) original radix tree code is ~1TB
rather than ~4TB. This restricted range is due to some shift << >>
operators used in the radix tree code that I didn't want to make more
complicated.
So, my recommendation is to fix the intermediate calculations and keep
the swapblk related blockno fields 32 bits.
The preallocation for the vm_object's auxillary structure must be large
enough to actually be able to fill up swap and assign all the swap blocks.
This is what eats the physical memory (4 bytes per 4K = 1024x storage
factor). The radix tree bitmap itself winds up eating only around
2 bits per swap block in total overhead. So the auxillary structure is
the main culprit. You definitely want to keep those block number fields
in the aux structure 32 bits.
The practical limit of ~1TB of swap requires ~1GB of preallocated
physical memory with a 32 bit block number field. That would become
~2GB of preallocated memory if 64 bit block numbers were used instead,
for no gain other than wasting physical memory. Ok, nobody is likely
to actually need that much swap but people might be surprised, there are
a lot of modern-day uses for swap space that don't involve heavy paging
of anonymous memory.
-Matt
More information about the freebsd-stable
mailing list