svn commit: r251282 - head/sys/kern
Alfred Perlstein
bright at mu.org
Wed Jun 5 12:11:36 UTC 2013
Konstantin, is there any way to take some of Bruce's feedback into
account to get rid of the hard coded max?
-Alfred
On 6/4/13 1:14 AM, Bruce Evans wrote:
> On Tue, 4 Jun 2013, Konstantin Belousov wrote:
>
>> On Mon, Jun 03, 2013 at 02:24:26AM -0700, Alfred Perlstein wrote:
>>> On 6/3/13 12:55 AM, Konstantin Belousov wrote:
>>>> On Sun, Jun 02, 2013 at 09:27:53PM -0700, Alfred Perlstein wrote:
>>>>> Hey Konstaintin, shouldn't this be scaled against the actual
>>>>> amount of
>>>>> KVA we have instead of an arbitrary limit?
>>>> The commit changes the buffer cache to scale according to the
>>>> available
>>>> KVA, making the scaling less dumb.
>>>>
>>>> I do not understand what exactly do you want to do, please describe
>>>> the
>>>> algorithm you propose to implement instead of my change.
>>>
>>> Sure, how about deriving the hardcoded "32" from the maxkva a machine
>>> can have?
>>>
>>> Is that possible?
>> I do not see why this would be useful. Initially I thought about simply
>> capping nbuf at 100000 without referencing any "memory". Then I realized
>> that this would somewhat conflict with (unlikely) changes to the value
>> of BKVASIZE due to "factor".
>
> The presence of BKVASIZE in 'factor' is a bug. My version never had this
> bug (see below for a patch). The scaling should be to maximize nbuf,
> subject to non-arbitrary limits on physical memory and kva, and now an
> arbirary limit of about 100000 / (BKVASIZE / 16384) on nbuf. Your new
> limit is arbitrary so it shouldn't affect nbuf depending on BKVASIZE.
>
> Expanding BKVASIZE should expand kva use, but on i386 this will soon
> hit a non-arbitary kva limit so nbuf will not be as high as preferred.
> nbuf needs to be very large mainly to support file systems with small
> buffers. Even 100000 only gives 50MB of buffering if the fs block
> size is 512. This would shrink to only 12.5MB if BKVASIZE is expanded
> by a factor of 4 and the bug is not fixed. If 25000 buffers after
> expanding BKVASIZE is enough, then that should be the arbitary limit
> (independent of BKVASIZE) so as to save physical memory.
>
> On i386 systems with 1GB RAM, nbuf defaults to about 7000. With an
> fs block size of 512, that can buffer 3.5MB. Expanding BKVASIZE by a
> factor of 4 shrinks this to 0.875MB in -current. That is ridiculously
> small. VMIO limits the lossage from this.
>
> BKVASIZE was originally 8KB. I forget if nbuf was halved by not
> modifying
> the scale factor when it was expanded to 16KB. Probably not. I used to
> modify the scale factor to get twice as many as the default nbuf, but
> once the default nbuf expanded to a few thousand it became large enough
> for most purposes so I no longer do this.
>
> Except on arches with extremely limit kva like i386, KVASIZE should be
> MAXBSIZE, and all of the complications for expanding a buffer's kva
> beyond BKVASIZE and handling the fragmentation problems caused by this
> shouldn't exist. Limits like vfs.maxbufspace should be scaled by
> NOMBSIZE = current BKVASIZE instead of BKVASIZE. Oops, my NOMBSIZE
> has similar logical problems to BKVASIZE. It needs to be precisely
> 16KB to preserve the defaults for nbuf and maxbufspace, but the nominal
> (ffs) buffer size is now 32KB. So the heuristic scale factors should
> use the historical magic number 16K. I changed sequential_heuristic()
> to use a hard-coded 16K instead of BKVASIZE. The correct number here
> depends on disk technology.
>
> The patch has many style fixes:
>
> @ Index: vfs_bio.c
> @ ===================================================================
> @ RCS file: /home/ncvs/src/sys/kern/vfs_bio.c,v
> @ retrieving revision 1.436
> @ diff -u -2 -r1.436 vfs_bio.c
> @ --- vfs_bio.c 17 Jun 2004 17:16:49 -0000 1.436
> @ +++ vfs_bio.c 3 Jun 2013 16:04:34 -0000
> @ @@ -419,64 +493,54 @@
> @ @ /*
> @ - * Calculating buffer cache scaling values and reserve space for
> buffer
> @ + * Calculate buffer cache scaling values and reserve space for buffer
> @ * headers. This is called during low level kernel initialization and
> @ * may be called more then once. We CANNOT write to the memory area
> @ * being reserved at this time.
> @ */
> @ -caddr_t
> @ -kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est)
> @ +void *
> @ +vfs_bio_alloc(void *va)
>
> The API name was verbose and bogus. The prefix for this subsystem is
> vfs_bio_, not kern_vfs_bio_. This is a generic allocation routine.
> It always allocated swbufs and has expanded to do more allocations.
>
> @ {
> @ - /*
> @ - * physmem_est is in pages. Convert it to kilobytes (assumes
> @ - * PAGE_SIZE is >= 1K)
> @ - */
> @ - physmem_est = physmem_est * (PAGE_SIZE / 1024);
>
> I use the global physmem. This change may be too i386-specific. In
> with 8-16 RAM, a significant amount of RAM may be eaten by the kernel
> image or perhaps just unavailable to the kernel. Now memories are
> larger and the amount eaten is relatively insignificant (since we are
> only using a small fraction of physmem, only the relative error matters).
>
> @ + int factor, memsize;
> @ @ /*
> @ - * The nominal buffer size (and minimum KVA allocation) is
> BKVASIZE.
> @ - * For the first 64MB of ram nominally allocate sufficient
> buffers to
> @ - * cover 1/4 of our ram. Beyond the first 64MB allocate
> additional
> @ - * buffers to cover 1/20 of our ram over 64MB. When auto-sizing
> @ - * the buffer cache we limit the eventual kva reservation to
> @ + * The nominal buffer size is NOMBSIZE.
> @ + * For the first 4MB of RAM, allocate 50 buffers.
> @ + * For the next 60MB of RAM, nominally allocate sufficient
> buffers to
> @ + * cover 1/4 of the RAM. Beyond the first 64MB allocate
> additional
> @ + * buffers to cover 1/20 of the RAM. When auto-sizing
> @ + * the buffer cache, limit the eventual kva reservation to
> @ * maxbcache bytes.
>
> Fix old bitrot in this comment, and update for my changes.
> - the first 4MB wasn't mentioned
> - the minimum clamp wasn't mentioned
> - replace BKVASIZE by NOMBSIZE so that I can change BKVASIZE without
> changing the default nbuf (subject to maxbcache).
>
> The bitrot of the magic number 1/20 is not fixed. The code uses 1/10.
> This is fixed in -current. I should have noticed this before, since
> I used to have to change the code to get 1/10 instead of 1/20.
>
> I hope you updated the comment for recent changes, but I don't like
> having magic numbers in comments.
>
> @ *
> @ - * factor represents the 1/4 x ram conversion.
> @ + * `factor' represents the 1/4 of RAM conversion.
> @ */
> @ +#define NOMBSIZE imax(PAGE_SIZE, 16384) /* XXX */
> @ if (nbuf == 0) {
> @ - int factor = 4 * BKVASIZE / 1024;
> @ + /*
> @ + * physmem is in pages. Convert it to a size in kilobytes.
> @ + * XXX no ptob().
> @ + */
> @ + memsize = ctob(physmem) / 1024;
> @ @ + factor = 4 * NOMBSIZE / 1024;
> @ nbuf = 50;
> @ - if (physmem_est > 4096)
> @ - nbuf += min((physmem_est - 4096) / factor,
> @ - 65536 / factor);
> @ - if (physmem_est > 65536)
> @ - nbuf += (physmem_est - 65536) * 2 / (factor * 5);
> @ -
> @ - if (maxbcache && nbuf > maxbcache / BKVASIZE)
> @ - nbuf = maxbcache / BKVASIZE;
>
> The maxbcache limit must be applied to the non-default nbuf too.
>
> @ + if (memsize > 4096)
> @ + nbuf += imin((memsize - 4096) / factor,
> @ + (65536 - 4096) / factor);
> @ + if (memsize > 65536)
> @ + nbuf += (memsize - 65536) * 2 / (factor * 5);
> @ }
> @ -
> @ -#if 0
> @ - /*
> @ - * Do not allow the buffer_map to be more then 1/2 the size of the
> @ - * kernel_map.
> @ - */
> @ - if (nbuf > (kernel_map->max_offset - kernel_map->min_offset) /
> @ - (BKVASIZE * 2)) {
> @ - nbuf = (kernel_map->max_offset - kernel_map->min_offset) /
> @ - (BKVASIZE * 2);
> @ - printf("Warning: nbufs capped at %d\n", nbuf);
> @ - }
> @ -#endif
>
> Already removed in -current.
>
> @ + if (nbuf > maxbcache / BKVASIZE)
> @ + nbuf = maxbcache / BKVASIZE;
> @ @ /*
> @ * swbufs are used as temporary holders for I/O, such as paging
> I/O.
> @ - * We have no less then 16 and no more then 256.
> @ + * There must be between 16 and 256 of them.
> @ */
> @ - nswbuf = max(min(nbuf/4, 256), 16);
> @ + nswbuf = imax(imin(nbuf / 4, 256), 16);
> @ #ifdef NSWBUF_MIN
> @ if (nswbuf < NSWBUF_MIN)
> @ nswbuf = NSWBUF_MIN;
> @ #endif
> @ +
> @ #ifdef DIRECTIO
> @ ffs_rawread_setup();
> @ @@ -484,12 +548,12 @@
> @ @ /*
> @ - * Reserve space for the buffer cache buffers
> @ + * Reserve space for swbuf headers and buffer cache buffers.
> @ */
> @ - swbuf = (void *)v;
> @ - v = (caddr_t)(swbuf + nswbuf);
> @ - buf = (void *)v;
> @ - v = (caddr_t)(buf + nbuf);
> @ + swbuf = va;
> @ + va = swbuf + nswbuf;
> @ + buf = va;
> @ + va = buf + nbuf;
> @ @ - return(v);
> @ + return (va);
> @ }
> @
>
> Don't use the caddr_t mistake. The API was also changed to not use it.
>
> Bruce
>
More information about the svn-src-all
mailing list