iSCSI failing, MLX rx_ring errors ?

Mon Jan 2 11:09:23 UTC 2017

Hi Meny,

Thank you very much for your feedback.

I think you are right, this could be a mbufs issue.
Here are some more numbers :

# vmstat -z | grep -v "0,   0$"
ITEM                   SIZE   LIMIT     USED     FREE         REQ      FAIL SLEEP
4 Bucket:                32,      0,    2673,   28327,   88449799,    17317, 0
8 Bucket:                64,      0,     449,   15609,   13926386,     4871, 0
12 Bucket:               96,      0,     335,    5323,   10293892,   142872, 0
16 Bucket:              128,      0,     533,    6070,    7618615,   472647, 0
32 Bucket:              256,      0,    8317,   22133,   36020376,   563479, 0
64 Bucket:              512,      0,    1238,    3298,   20138111, 11430742, 0
128 Bucket:            1024,      0,    1865,    2963,   21162182,   158752, 0
256 Bucket:            2048,      0,    1626,     450,   80253784,  4890164, 0
mbuf_jumbo_9k:         9216, 603712,   16400,    8744, 4128521064,     2661, 0

# netstat -m
32801/18814/51615 mbufs in use (current/cache/total)
16400/9810/26210/4075058 mbuf clusters in use (current/cache/total/max)
16400/9659 mbuf+clusters out of packet secondary zone in use (current/cache)
0/8647/8647/2037529 4k (page size) jumbo clusters in use (current/cache/total/max)
16400/8744/25144/603712 9k jumbo clusters in use (current/cache/total/max)
0/0/0/339588 16k jumbo clusters in use (current/cache/total/max)
188600K/137607K/326207K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/2661/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed

I did not perform any mbufs tuning, numbers above are from FreeBSD itself.

This server has 64GB of memory.
It has a ZFS pool for which I limit ARC memory impact with :
vfs.zfs.arc_max=64424509440 #60G

The only thing I did is some TCP tuning to improve throughput over high-latency long-distance private links :
kern.ipc.maxsockbuf=7372800
net.inet.tcp.sendbuf_max=6553600
net.inet.tcp.recvbuf_max=6553600
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.tcp.sendbuf_inc=65536
net.inet.tcp.recvbuf_inc=65536
net.inet.tcp.cc.algorithm=htcp

Here are some graphs of memory & ARC usage when issue occurs.
Crosshair (vertical red line) is at the timestamp where I get iSCSI disconnections.
https://postimg.org/gallery/1kkekrc4e/
What is strange is that each time issue occurs there is around 1GB of free memory.
So FreeBSD should still be able to allocate some more mbufs ?
Unfortunately I do not have graphs about mbufs.

What should I ideally do ?

Thank you again,

Best regards,

Ben

> On 01 Jan 2017, at 09:16, Meny Yossefi <menyy at mellanox.com> wrote:
> 
> Hi Ben, 
>  
> Those are not HW errors, note that:
>  
> hw.mlxen1.stat.rx_dropped: 0
> hw.mlxen1.stat.rx_errors: 0
>  
> It seems to be triggered when you are failing to allocate a replacement buffer.
> Any chance you ran out of mbufs in the system?
>  
> en_rx.c:
>  
> mlx4_en_process_rx_cq():
>  
>    mb = mlx4_en_rx_mb(priv, rx_desc, mb_list, length);
>                 if (!mb) {
>                         ring->errors++;
>                         goto next;
>                 }
>  
> mlx4_en_rx_mb() à mlx4_en_complete_rx_desc():
>  
>   /* Allocate a replacement page */
>                 if (mlx4_en_alloc_buf(priv, rx_desc, mb_list, nr))
>                         goto fail;
>  
> -Meny