GPT vs MBR for swap devices
Warner Losh
imp at bsdimp.com
Fri Jun 15 15:57:19 UTC 2018
On Fri, Jun 15, 2018 at 9:43 AM, bob prohaska <fbsd at www.zefox.net> wrote:
> On Thu, Jun 14, 2018 at 11:37:48PM -0700, Mark Millard wrote:
> >
> > When I look at:
> >
> > # vmstat -c -w 5
> > procs memory page disks faults cpu
> > r b w avm fre flt re pi po fr sr da0 ad0 in sy cs
> us sy id
> > 1 0 0 416M 224M 1647 1 0 0 1856 142 0 0 144 1791 1024
> 4 2 94
> > 0 0 0 416M 224M 9 0 0 0 0 1 0 0 4 85 116
> 0 0 100
> > 0 0 0 416M 224M 12 0 0 0 0 1 0 0 2 93 113
> 0 0 100
> > 0 0 0 416M 224M 9 0 0 0 2 1 1 0 4 64 121
> 0 0 100
> > . . .
> >
> > and "man vmstat" I do not see any column that is the swap space
> > usage (nor any combination of columns to do such a calculation
> > from).
> >
> > I do not expect that vmstat reports what you are likely/primarily
> > looking for.
> >
> > An example is "avm" which for which the man page reports:
> >
> > . . . Note that the entire
> > memory object's size is considered mapped even if only a
> subset
> > of the object's pages are currently mapped. This statistic
> is
> > not related to the active page queue which is used to track
> real
> > memory.
> >
> > The free list size ("fre") is not sufficient either.
> >
>
> That seems astonishing. I imagined that among those columns _had_ to be
> reads from and writes to the swap partitions.
>
> It looks as if
> top -d 1000 | grep Swap
> produces a running list of swap usage, but one must guess how many
> times to iterate:
>
> bob at www:/usr/src % top -d 1000 | grep Swap
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> Swap: 3072M Total, 30M Used, 3041M Free
> .......
>
> Replacing the "1000" with "0" or "infinite" triggers
> a syntax error. Is there a special parameter that makes top run till
> it's killed, as in interactive mode? I didn't recognize any hint in the
> man page.
>
> Thanks for reading!
>
Right, this is why I was suggesting gstat. It's a direct measure of the
read/write performance of the device with some latency numbers. It will
give the kind of data I'm looking for. vmstat won't, top won't. I don't
care about used/free swap usage. I care about performance to the swap
partition. That's what I'm suspecting in the USB thumb drive FTL. I don't
care what the total swap usage is. I suspect that's irrelevant to the issue
at hand since the OOM isn't triggering because we're filling swap, but more
that it's due to not being able to get enough pages to the swap device fast
enough to satisfy the memory shortages, triggering OOM.
As for why it would affect the USB drive and not SD cards, I can only say
that USB drives tend to be first to market with bigger capacities. This has
traditionally made them less well tuned for anything other than large, long
sequential reads or writes that aren't mixed. More so than even SD or uSD
cards which tend to do better than USB drives at that workload. It's the
FTL that's the issue, not the NAND itself. The FTL is the software that
translates the log-style device you have to have for flash to work to the
LBA style devices that people attach to systems. If it can't cope with a
mixed workload, or needs to do too much garbage collection or
read/modify/write operations due to it's poor quality / tuning, that will
show up as long delays. USB flash also tends to suck more with BIO_DELETE
than others, though the swapper doesn't do that, so that's one fewer
wildcards we need to look at.
gstat -Bd -I 10 -f <regexp for your swap partition> > gstat-swap-data.dat
would be how I'd recommend collecting it. This file may get kinda big
depending how long it takes to trigger the weird state. I'm hoping that if
you put this on a known good device, we'll power through the issues. We
might not get perfect correlation with this, but the data should show all
kinds of crazy before the system drives off the cliff if I'm right, so we
don't need perfect data.
There's some higher fidelity numbers we can get from the I/O scheduler with
dynamic scheduling compiled in, but I don't think we'll need those.
Warner
More information about the freebsd-arm
mailing list