FW: iSCSI failing, MLX rx_ring errors ?
Julien Cigar
julien at perdition.city
Tue Jan 3 16:21:26 UTC 2017
is it not the same issue as PR 211990 ? can you try by turning off jumbo
frames ?
On Tue, Jan 03, 2017 at 06:27:15AM +0000, Meny Yossefi wrote:
>
> ________________________________________
> From: owner-freebsd-net at freebsd.orgOn Behalf OfBen RUBSON
> Sent: Monday, January 2, 2017 11:09:15 AM (UTC+00:00) Monrovia, Reykjavik
> To: freebsd-net at freebsd.org
> Cc: Meny Yossefi; Yuval Bason; Hans Petter Selasky
> Subject: Re: iSCSI failing, MLX rx_ring errors ?
>
> Hi Meny,
>
> Thank you very much for your feedback.
>
> I think you are right, this could be a mbufs issue.
> Here are some more numbers :
>
> # vmstat -z | grep -v "0, 0$"
> ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
> 4 Bucket: 32, 0, 2673, 28327, 88449799, 17317, 0
> 8 Bucket: 64, 0, 449, 15609, 13926386, 4871, 0
> 12 Bucket: 96, 0, 335, 5323, 10293892, 142872, 0
> 16 Bucket: 128, 0, 533, 6070, 7618615, 472647, 0
> 32 Bucket: 256, 0, 8317, 22133, 36020376, 563479, 0
> 64 Bucket: 512, 0, 1238, 3298, 20138111, 11430742, 0
> 128 Bucket: 1024, 0, 1865, 2963, 21162182, 158752, 0
> 256 Bucket: 2048, 0, 1626, 450, 80253784, 4890164, 0
> mbuf_jumbo_9k: 9216, 603712, 16400, 8744, 4128521064, 2661, 0
>
> # netstat -m
> 32801/18814/51615 mbufs in use (current/cache/total)
> 16400/9810/26210/4075058 mbuf clusters in use (current/cache/total/max)
> 16400/9659 mbuf+clusters out of packet secondary zone in use (current/cache)
> 0/8647/8647/2037529 4k (page size) jumbo clusters in use (current/cache/total/max)
> 16400/8744/25144/603712 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/339588 16k jumbo clusters in use (current/cache/total/max) 188600K/137607K/326207K bytes allocated to network (current/cache/total)
> 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
> 0/2661/0 requests for jumbo clusters denied (4k/9k/16k)
> 0 sendfile syscalls
> 0 sendfile syscalls completed without I/O request
> 0 requests for I/O initiated by sendfile
> 0 pages read by sendfile as part of a request
> 0 pages were valid at time of a sendfile request
> 0 pages were requested for read ahead by applications
> 0 pages were read ahead by sendfile
> 0 times sendfile encountered an already busy page
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
>
> I did not perform any mbufs tuning, numbers above are from FreeBSD itself.
>
> This server has 64GB of memory.
> It has a ZFS pool for which I limit ARC memory impact with :
> vfs.zfs.arc_max=64424509440 #60G
>
> The only thing I did is some TCP tuning to improve throughput over high-latency long-distance private links :
> kern.ipc.maxsockbuf=7372800
> net.inet.tcp.sendbuf_max=6553600
> net.inet.tcp.recvbuf_max=6553600
> net.inet.tcp.sendspace=65536
> net.inet.tcp.recvspace=65536
> net.inet.tcp.sendbuf_inc=65536
> net.inet.tcp.recvbuf_inc=65536
> net.inet.tcp.cc.algorithm=htcp
>
> Here are some graphs of memory & ARC usage when issue occurs.
> Crosshair (vertical red line) is at the timestamp where I get iSCSI disconnections.
> https://postimg.org/gallery/1kkekrc4e/
> What is strange is that each time issue occurs there is around 1GB of free memory.
> So FreeBSD should still be able to allocate some more mbufs ?
> Unfortunately I do not have graphs about mbufs.
>
> What should I ideally do ?
>
> >> Have you tried increasing the mbufs limit?
> (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed)
>
>
> Thank you again,
>
> Best regards,
>
> Ben
>
>
>
> > On 01 Jan 2017, at 09:16, Meny Yossefi <menyy at mellanox.com> wrote:
> >
> > Hi Ben,
> >
> > Those are not HW errors, note that:
> >
> > hw.mlxen1.stat.rx_dropped: 0
> > hw.mlxen1.stat.rx_errors: 0
> >
> > It seems to be triggered when you are failing to allocate a replacement buffer.
> > Any chance you ran out of mbufs in the system?
> >
> > en_rx.c:
> >
> > mlx4_en_process_rx_cq():
> >
> > mb = mlx4_en_rx_mb(priv, rx_desc, mb_list, length);
> > if (!mb) {
> > ring->errors++;
> > goto next;
> > }
> >
> > mlx4_en_rx_mb() à mlx4_en_complete_rx_desc():
> >
> > /* Allocate a replacement page */
> > if (mlx4_en_alloc_buf(priv, rx_desc, mb_list, nr))
> > goto fail;
> >
> > -Meny
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
> _______________________________________________
> freebsd-net at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-net
> To unsubscribe, send any mail to "freebsd-net-unsubscribe at freebsd.org"
--
Julien Cigar
Belgian Biodiversity Platform (http://www.biodiversity.be)
PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-net/attachments/20170103/f65dbe4b/attachment.sig>
More information about the freebsd-net
mailing list