Just joined the infiniband club
John Fleming
john at spikefishsolutions.com
Fri Sep 13 17:04:42 UTC 2019
On Sat, Sep 7, 2019 at 9:26 PM Jason Bacon <bacon4000 at gmail.com> wrote:
>
> On 2019-09-07 19:00, John Fleming wrote:
> > Hi all, i've recently joined the club. I have two Dell R720s connected
> > directly to each other. The card is a connectx-4. I was having a lot
> > of problem with network drops. Where i'm at now is i'm running
> > FreeBSD12-Stable as of a week ago and cards have been cross flashed
> > with OEM firmware (these are lenovo i think) and i'm no longer getting
> > network drops. This box is basically my storage server. Its exporting
> > a raid 10 ZFS volume to a linux (compute 19.04 5.0.0-27-generic) box
> > which is running GNS3 for a lab.
> >
> > So many questions.. sorry if this is a bit rambly!
> >
> > From what I understand this card is really 4 x 25 gig lanes. If i
> > understand that correctly then 1 data transfer should be able to do at
> > max 25 gig (best case) correct?
> >
> > I'm not getting what the difference between connected mode and
> > datagram mode is. Does this have anything to do with the card
> > operating in infiniband mode vs ethernet mode? FreeBSD is using the
> > modules compiled in connected mode with shell script (which is really
> > a bash script not a sh script) from freebsd-infiniband page.
>
> Nothing to do with Ethernet...
>
> Google turned up a brief explanation here:
>
> https://wiki.archlinux.org/index.php/InfiniBand
>
I still don't get why I would want to use one of the the other or why
the option is there but it doesn't matter.
After firmware upgrade and moving to FreeBSD stable (unsure which is
triggering this) i can no longer
set connected mode on linux. There are a lot of posts that say you
have to diabled enhanced iboip mode
via a modules.conf setting but the driver doesn't have any idea what
that is. echoing connnected to mode file
throws a write error. I poked around in linux source but like i'm not
even level 1 fighter on C. i'm like generic NPC
that says hi at the gates.
> Those are my module building scripts on the wiki. What bash extensions
> did you see?
Isn't this a bash..ism? When i run it inside sh it throws a fit. No
worries, i just edited loaded.conf
auto-append-line
> >
> > Linux box complains if mtu is over 2044 with expect mulitcast drops or
> > something like that so mtu on both boxes is set to 2044.
> >
> > Everything i'm reading makes it sound like there is no RDMA support in
> > FreeBSD or maybe that was no NFS RDMA support. Is that correct?
> RDMA is inherent in Infiniband AFAIK. Last I checked, there was no
> support in FreeBSD for NFS over RDMA, but news travels slowly in this
> group so a little digging might prove otherwise.
> >
> > So far it seems like these cards struggle to full 10 gig pipe. Using
> > iperf (2) the best i'm getting is around 6gb(bit) sec. Interfaces
> > aren't showing drops on either end. Doesn't seem to matter if i do 1,
> > 2 or 4 threads on iperf.
> You'll need both ends in connected mode with a fairly large MTU to get
> good throughput. CentOS defaults to 64k, but FreeBSD is unstable at
> that size last I checked. I got good results with 16k.
>
> My FreeBSD ZFS NFS server performed comparably to the CentOS servers,
> with some buffer space errors causing the interface to shut down (under
> the same loads that caused CentOS servers to lock up completely).
> Someone mentioned that this buffer space bug has been fixed, but I no
> longer have a way to test it.
>
> Best,
>
> Jason
>
> --
> Earth is a beta site.
So .. i ended up switch to linux mode via mlxconfig -d PCID set
LINK_TYPE_P1=2 LINK_TYPE_P2=2
Oh i also set MTU to 9000.
After that.. the flood gates opened massively.
root at R720-Storage:~ # iperf -c 10.255.255.55 -P4
------------------------------------------------------------
Client connecting to 10.255.255.55, TCP port 5001
TCP window size: 1.01 MByte (default)
------------------------------------------------------------
[ 6] local 10.255.255.22 port 62256 connected with 10.255.255.55 port 5001
[ 3] local 10.255.255.22 port 51842 connected with 10.255.255.55 port 5001
[ 4] local 10.255.255.22 port 53680 connected with 10.255.255.55 port 5001
[ 5] local 10.255.255.22 port 33455 connected with 10.255.255.55 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 24.6 GBytes 21.1 Gbits/sec
[ 3] 0.0-10.0 sec 23.8 GBytes 20.5 Gbits/sec
[ 4] 0.0-10.0 sec 33.4 GBytes 28.7 Gbits/sec
[ 5] 0.0-10.0 sec 32.9 GBytes 28.3 Gbits/sec
[SUM] 0.0-10.0 sec 115 GBytes 98.5 Gbits/sec
root at R720-Storage:~ #
11:56 AM
root at compute720:~# iperf -c 10.255.255.22 -P4
------------------------------------------------------------
Client connecting to 10.255.255.22, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 5] local 10.255.255.55 port 50022 connected with 10.255.255.22 port 5001
[ 3] local 10.255.255.55 port 50026 connected with 10.255.255.22 port 5001
[ 6] local 10.255.255.55 port 50024 connected with 10.255.255.22 port 5001
[ 4] local 10.255.255.55 port 50020 connected with 10.255.255.22 port 5001
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 27.4 GBytes 23.5 Gbits/sec
[ 3] 0.0-10.0 sec 26.2 GBytes 22.5 Gbits/sec
[ 6] 0.0-10.0 sec 26.8 GBytes 23.1 Gbits/sec
[ 4] 0.0-10.0 sec 26.0 GBytes 22.3 Gbits/sec
[SUM] 0.0-10.0 sec 106 GBytes 91.4 Gbits/sec
root at compute720:~#
I should point out before doing this while running in IB mode with
datagram mode i disabled SMT and set the power profile to performance
on box boxes. This moved me up to 10-12 gig/sec, nothing like the
change to ethernet which i can now fill the pipe from the looks of it.
Also note a single connection doesn't do more then 25ishgig/sec.
Back to SATA being the bottle neck but at least if its coming out of
the cache there should be more then enough network IO.
Oh one last thing, i thought i read somewhere that you needed to have
a switch to do ethernet mode. This doesn't seem to be the case. I
haven't shutdown opensm yet but i'll try that later as i'm assuming i
no longer need that.
w00t!
More information about the freebsd-stable
mailing list