mbuf_jumbo_9k & iSCSI failing
Ben RUBSON
ben.rubson at gmail.com
Sun Jun 25 14:54:31 UTC 2017
> On 30 Dec 2016, at 22:55, Ben RUBSON <ben.rubson at gmail.com> wrote:
>
> Hello,
>
> 2 FreeBSD 11.0-p3 servers, one iSCSI initiator, one target.
> Both with Mellanox ConnectX-3 40G.
>
> Since a few days, sometimes, under undetermined circumstances, as soon as there is some (very low) iSCSI traffic, some of the disks get disconnected :
> kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) after 5 seconds; dropping connection
>
> At the same moment, sysctl counters hw.mlxen1.stat.rx_ring*.error grow on initiator side.
>
> I then tried to reproduce these network errors burning the link at 40G full-duplex using iPerf.
> But I did not manage to increase these error counters.
>
> It's strange because it's a sporadic issue, I can have traffic on iSCSI disks without any issue, and sometimes, they get disconnected with errors growing.
> On 01 Jan 2017, at 09:16, Meny Yossefi <menyy at mellanox.com> wrote:
>
> Any chance you ran out of mbufs in the system?
> On 02 Jan 2017, at 12:09, Ben RUBSON <ben.rubson at gmail.com> wrote:
>
> I think you are right, this could be a mbufs issue.
> Here are some more numbers :
>
> # vmstat -z | grep -v "0, 0$"
> ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
> 4 Bucket: 32, 0, 2673, 28327, 88449799, 17317, 0
> 8 Bucket: 64, 0, 449, 15609, 13926386, 4871, 0
> 12 Bucket: 96, 0, 335, 5323, 10293892, 142872, 0
> 16 Bucket: 128, 0, 533, 6070, 7618615, 472647, 0
> 32 Bucket: 256, 0, 8317, 22133, 36020376, 563479, 0
> 64 Bucket: 512, 0, 1238, 3298, 20138111, 11430742, 0
> 128 Bucket: 1024, 0, 1865, 2963, 21162182, 158752, 0
> 256 Bucket: 2048, 0, 1626, 450, 80253784, 4890164, 0
> mbuf_jumbo_9k: 9216, 603712, 16400, 8744, 4128521064, 2661, 0
> On 03 Jan 2017, at 07:27, Meny Yossefi <menyy at mellanox.com> wrote:
>
> Have you tried increasing the mbufs limit?
> (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed)
> On 04 Jan 2017, at 14:47, Ben RUBSON <ben.rubson at gmail.com> wrote:
>
> No I did not try this yet.
> However, from the numbers above (and below), I think I should increase kern.ipc.nmbjumbo9 instead ?
> On 30 Jan 2017, at 15:36, Ben RUBSON <ben.rubson at gmail.com> wrote:
>
> So, to give some news, increasing kern.ipc.nmbjumbo9 helped a lot.
> Just a very little issue (compared to the others before) over the last 3 weeks.
Hello,
I'm back today with this issue.
Above is my discussion with Meny from Mellanox at the beginning of 2017.
(topic was "iSCSI failing, MLX rx_ring errors ?", on freebsd-net list)
So this morning issue came again, some of my iSCSI disks were disconnected.
Below are some numbers.
# vmstat -z | grep -v "0, 0$"
ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
8 Bucket: 64, 0, 654, 8522, 28604967, 11, 0
12 Bucket: 96, 0, 976, 5092, 23758734, 78, 0
32 Bucket: 256, 0, 789, 4491, 43446969, 137, 0
64 Bucket: 512, 0, 666, 2750, 47568959, 1272018, 0
128 Bucket: 1024, 0, 1047, 1249, 28774042, 232504, 0
256 Bucket: 2048, 0, 1611, 369, 139988097, 8931139, 0
vmem btag: 56, 0, 2949738, 15506, 18092235, 20908, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 8776, 8610737115, 297, 0
# uname -rs
FreeBSD 11.0-RELEASE-p8
# uptime
3:34p.m. up 88 days, 15:57, 2 users, load averages: 0.95, 0.67, 0.62
# grep kern.ipc.nmb /boot/loader.conf
kern.ipc.nmbjumbo9=2037529
kern.ipc.nmbjumbo16=1
# sysctl kern.ipc | grep mb
kern.ipc.nmbufs: 26080380
kern.ipc.nmbjumbo16: 4
kern.ipc.nmbjumbo9: 6112587
kern.ipc.nmbjumbop: 2037529
kern.ipc.nmbclusters: 4075060
kern.ipc.maxmbufmem: 33382887424
# ifconfig mlxen1
mlxen1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9020
options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>)
status: active
I just caught the issue growing :
# vmstat -z | grep mbuf_jumbo_9k
ITEM SIZE LIMIT USED FREE REQ FAIL SLEEP
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735246407, 665, 0
mbuf_jumbo_9k: 9216, 2037529, 16411, 7320,8735286748, 665, 0
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735298937, 667, 0
mbuf_jumbo_9k: 9216, 2037529, 16438, 7293,8735337634, 667, 0
mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8735354339, 668, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735382105, 669, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735392836, 671, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735423910, 671, 0
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735456393, 671, 0
mbuf_jumbo_9k: 9216, 2037529, 16409, 7322,8735472284, 672, 0
mbuf_jumbo_9k: 9216, 2037529, 16420, 7311,8735512237, 673, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735518502, 675, 0
mbuf_jumbo_9k: 9216, 2037529, 16410, 7321,8735543668, 676, 0
mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8735555646, 678, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735568986, 679, 0
mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735579075, 680, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735603983, 681, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735634273, 681, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735646057, 683, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8735658213, 684, 0
mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8735675678, 686, 0
mbuf_jumbo_9k: 9216, 2037529, 16415, 7316,8735686017, 687, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8735707335, 687, 0
mbuf_jumbo_9k: 9216, 2037529, 16414, 7317,8736016546, 708, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736037292, 709, 0
mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736053865, 710, 0
mbuf_jumbo_9k: 9216, 2037529, 16402, 7329,8736070103, 711, 0
mbuf_jumbo_9k: 9216, 2037529, 16407, 7324,8736086810, 711, 0
mbuf_jumbo_9k: 9216, 2037529, 16430, 7301,8736098568, 713, 0
mbuf_jumbo_9k: 9216, 2037529, 16405, 7326,8736122803, 714, 0
mbuf_jumbo_9k: 9216, 2037529, 16417, 7314,8736134322, 715, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736152338, 715, 0
mbuf_jumbo_9k: 9216, 2037529, 16403, 7328,8736167677, 715, 0
mbuf_jumbo_9k: 9216, 2037529, 16400, 7331,8736170783, 717, 0
mbuf_jumbo_9k: 9216, 2037529, 16445, 7286,8736546084, 733, 0
During this, top was reporting the following :
Mem: 4056K Active, 426M Inact, 59G Wired, 2531M Free
And in /var/log/messages :
kernel: WARNING: 192.168.2.2 (iqn......): no ping reply (NOP-Out) after 5 seconds; dropping connection
Any idea why I'm experiencing this ?
Thank you very much for your help & support,
Best regards,
Ben
More information about the freebsd-scsi
mailing list