iSCSI failing, MLX rx_ring errors ?
Ben RUBSON
ben.rubson at gmail.com
Mon Jan 2 11:09:23 UTC 2017
Hi Meny,
Thank you very much for your feedback.
I think you are right, this could be a mbufs issue.
Here are some more numbers :
# vmstat -z | grep -v "0, 0$"
ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
4 Bucket: 32, 0, 2673, 28327, 88449799, 17317, 0
8 Bucket: 64, 0, 449, 15609, 13926386, 4871, 0
12 Bucket: 96, 0, 335, 5323, 10293892, 142872, 0
16 Bucket: 128, 0, 533, 6070, 7618615, 472647, 0
32 Bucket: 256, 0, 8317, 22133, 36020376, 563479, 0
64 Bucket: 512, 0, 1238, 3298, 20138111, 11430742, 0
128 Bucket: 1024, 0, 1865, 2963, 21162182, 158752, 0
256 Bucket: 2048, 0, 1626, 450, 80253784, 4890164, 0
mbuf_jumbo_9k: 9216, 603712, 16400, 8744, 4128521064, 2661, 0
# netstat -m
32801/18814/51615 mbufs in use (current/cache/total)
16400/9810/26210/4075058 mbuf clusters in use (current/cache/total/max)
16400/9659 mbuf+clusters out of packet secondary zone in use (current/cache)
0/8647/8647/2037529 4k (page size) jumbo clusters in use (current/cache/total/max)
16400/8744/25144/603712 9k jumbo clusters in use (current/cache/total/max)
0/0/0/339588 16k jumbo clusters in use (current/cache/total/max)
188600K/137607K/326207K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/2661/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
I did not perform any mbufs tuning, numbers above are from FreeBSD itself.
This server has 64GB of memory.
It has a ZFS pool for which I limit ARC memory impact with :
vfs.zfs.arc_max=64424509440 #60G
The only thing I did is some TCP tuning to improve throughput over high-latency long-distance private links :
kern.ipc.maxsockbuf=7372800
net.inet.tcp.sendbuf_max=6553600
net.inet.tcp.recvbuf_max=6553600
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
net.inet.tcp.sendbuf_inc=65536
net.inet.tcp.recvbuf_inc=65536
net.inet.tcp.cc.algorithm=htcp
Here are some graphs of memory & ARC usage when issue occurs.
Crosshair (vertical red line) is at the timestamp where I get iSCSI disconnections.
https://postimg.org/gallery/1kkekrc4e/
What is strange is that each time issue occurs there is around 1GB of free memory.
So FreeBSD should still be able to allocate some more mbufs ?
Unfortunately I do not have graphs about mbufs.
What should I ideally do ?
Thank you again,
Best regards,
Ben
> On 01 Jan 2017, at 09:16, Meny Yossefi <menyy at mellanox.com> wrote:
>
> Hi Ben,
>
> Those are not HW errors, note that:
>
> hw.mlxen1.stat.rx_dropped: 0
> hw.mlxen1.stat.rx_errors: 0
>
> It seems to be triggered when you are failing to allocate a replacement buffer.
> Any chance you ran out of mbufs in the system?
>
> en_rx.c:
>
> mlx4_en_process_rx_cq():
>
> mb = mlx4_en_rx_mb(priv, rx_desc, mb_list, length);
> if (!mb) {
> ring->errors++;
> goto next;
> }
>
> mlx4_en_rx_mb() à mlx4_en_complete_rx_desc():
>
> /* Allocate a replacement page */
> if (mlx4_en_alloc_buf(priv, rx_desc, mb_list, nr))
> goto fail;
>
> -Meny
More information about the freebsd-net
mailing list