OK, so the buffer allocation problem with ZFS is fixed, but now I got this.... (VM management issues)

Sun Mar 16 15:56:45 UTC 2014

 From this morning.....

     3 users    Load  2.08  2.40  2.33                  Mar 16 10:41

Mem:KB    REAL            VIRTUAL                       VN PAGER SWAP PAGER
         Tot   Share      Tot    Share    Free           in out     in   out
Act 2341416   20052  9279140    56272  436064 count                 44    32
All 3144676   25544 10111364   255624 pages                112    43
Proc: Interrupts
   r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt     43 ioflt 29689 total
   3  40     200  45   80k  40k 186k  17k  438  14k   1348 cow 11 uart0 4
                                                      1890 zfod 2085 
uhci0 16
  8.0%Sys   2.3%Intr  5.6%User  0.0%Nice 84.1%Idle      48 ozfod       
pcm0 17
|    |    |    |    |    |    |    |    |    | 2%ozfod       ehci0 uhci
====+>>> daefr       uhci1 21
                                         55 dtbuf     2123 prcfr 520 
uhci3 ehci
Namei     Name-cache   Dir-cache    485946 desvn     8196 totfr 1045 
arcmsr0 30
    Calls    hits   %    hits   %    164802 numvn       22 react 1085 
cpu0:timer
    14254   14202 100                121388 frevn          pdwak 77 mps0 256
                                                   1663632 pdpgs 7020 
em0:rx 0
Disks   da0   da1   da2   da3   da4   da5   da6        49 intrn 6864 
em0:tx 0
KB/t   8.20 63.89 11.91 29.66 34.00 17.36  0.00   4829120 wire        
em0:link
tps      77  1497     9    19    15     9     0   2117724 act 86 em1:rx 0
MB/s   0.61 93.43  0.11  0.54  0.51  0.15  0.00  17078072 inact 87 em1:tx 0
%busy    26    26     1     3     2     1     0    431968 cache       
em1:link
                                                      7832 free        
ahci0:ch0
                                                   1694896 buf         
ahci0:ch2
655 cpu1:timer
898 cpu11:time
627 cpu2:timer
784 cpu10:time
938 cpu5:timer
1054 cpu13:time
636 cpu4:timer
476 cpu12:time
579 cpu3:timer
702 cpu8:timer
646 cpu6:timer
573 cpu9:timer
670 cpu7:timer
1056 cpu14:time
515 cpu15:time

This is a rather busy (read: extreme demands on the system) time during 
which I have managed to provoke some really awful behavior, including 
filesystem stalls.  The system in question has both ufs and zfs 
filesystems (but won't for much longer) along with running both SMB 
service (samba) and Postgres.

Of note is that nasty "inact" page count.  It has driven the adaptive 
ARC code patch (which is on this box) to trim the ARC cache down to the 
minimum, where it remains pinned.

My reading of the "inact" page count is that pages shouldn't stay in 
that state on an indefinite basis - - they should either be reactivated 
(if they're re-used) or invalidated and moved to the "cache" bucket 
where the VM code can free them.

Buuuuut.... neither is happening over the space of several hours and a 
look at the RSS of the working processes doesn't show anything 
interesting -- or different than normal activity in that regard.  17 
_*gigabytes*_ of inactive pages (out of 24 GB of RAM in total) and 
they're not being reclaimed?

Time for me to dig into the vm code?

FreeBSD 10.0-STABLE #13 r263037M: Fri Mar 14 14:58:11 CDT 2014 
karl at NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP

-- 
-- Karl
karl at denninger.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2711 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20140316/9a59b4bd/attachment.bin>