Questions about Infiniband on FreeBSD
Mikhail T.
mi+thun at aldan.algebra.com
Thu Oct 3 01:23:40 UTC 2019
On 02.10.19 20:49, Jason Bacon wrote:
> On 2019-10-02 18:58, Mikhail T. wrote:
>> 1. Why is running opensm mandatory even in a "point-to-point" setup
>> like mine? I would've thought, whatever the two ends need to tell
>> each other could be told /once/, after which the connection will
>> continue to work even if the opensm-process goes away.
>> Unfortunately, shutting down opensm freezes the connection... Is
>> that a hardware/firmware requirement, or can this be improved?
> A subnet manager is required for IPOIB. It's often run on the switch,
> but since you don't have one...
That's my question -- is that requirement coming from the hardware (or
firmware inside it)? What does the manager actually /do/ -- and,
whatever it is, does it really need doing constantly in a simple setup
like mine, or can opensm come up once (at boot time), do it, and then go
away?
>> 2. Although pings were working and NFS would mount, data-transfers
>> weren't reliable until I /manually/ lowered the MTU -- on both ends
>> -- to 2044 (from the 65520 used by the ib-interfaces by default).
>> And it only occurred to me to do that, when I saw a kernel's message
>> on one of the two consoles complaining about a packet length of 16k
>> being greater than 2044... If that's a known limit, why is not the
>> MTU set to it by default?
> I saw frequent hangs (self-resolving) with an MTU of 65520. Cutting it
> in half improved reliability by orders of magnitude, but still
> occasional issues. Halving it again to 16380 seemed to be the sweet spot.
Most interesting -- I thought, 2044 was the hardware limit of some
sort... Is not it a bug, that much larger values are allowed, but do not
work? I just raised it to 16380 here and things seems to continue
working (did "cvs update" of the entire pkgsrc-repo over NFS). But the
kernel said:
ib1: mtu > 2044 will cause multicast packet drops.
I probably don't care for multicast, as long as NFS works...
>> 3. Currently, I have only one cable connecting the ib1 on one machine
>> to ib1 of another. Would I get double the throughput if I connect
>> the two other ports together as well and bundle the connections? If
>> yes, should I bundle them as network-interfaces -- using lagg(4) --
>> or is there something Infiniband-specific?
> Good question. With Mellanox 6036 switches, nothing needs to be
> configured to benefit from multiple links. We ran 6 from each of two
> top-level switches to each of 6 leaf switches. The switches recognize
> the fabric topology automatically. I don't know if the same is true
> with the HCAs. You could try just adding a cable and compare results
> from iperf, etc.
Sorry, I don't understand, how that would "just work" -- if both
interfaces (ib1 and ib0) are configured separately, with different
IP-addresses, etc?
-mi
More information about the freebsd-infiniband
mailing list