Questions about Infiniband on FreeBSD

Mikhail T. mi+thun at aldan.algebra.com
Thu Oct 3 01:23:40 UTC 2019


On 02.10.19 20:49, Jason Bacon wrote:
> On 2019-10-02 18:58, Mikhail T. wrote:
>> 1. Why is running opensm mandatory even in a "point-to-point" setup
>>    like mine? I would've thought, whatever the two ends need to tell
>>    each other could be told /once/, after which the connection will
>>    continue to work even if the opensm-process goes away.
>>    Unfortunately, shutting down opensm freezes the connection... Is
>>    that a hardware/firmware requirement, or can this be improved?
> A subnet manager is required for IPOIB.  It's often run on the switch, 
> but since you don't have one...

That's my question -- is that requirement coming from the hardware (or 
firmware inside it)? What does the manager actually /do/ -- and, 
whatever it is, does it really need doing constantly in a simple setup 
like mine, or can opensm come up once (at boot time), do it, and then go 
away?

>> 2. Although pings were working and NFS would mount, data-transfers
>>    weren't reliable until I /manually/ lowered the MTU -- on both ends
>>    -- to 2044 (from the 65520 used by the ib-interfaces by default).
>>    And it only occurred to me to do that, when I saw a kernel's message
>>    on one of the two consoles complaining about a packet length of 16k
>>    being greater than 2044... If that's a known limit, why is not the
>>    MTU set to it by default?
> I saw frequent hangs (self-resolving) with an MTU of 65520. Cutting it 
> in half improved reliability by orders of magnitude, but still 
> occasional issues.  Halving it again to 16380 seemed to be the sweet spot.

Most interesting -- I thought, 2044 was the hardware limit of some 
sort... Is not it a bug, that much larger values are allowed, but do not 
work? I just raised it to 16380 here and things seems to continue 
working (did "cvs update" of the entire pkgsrc-repo over NFS). But the 
kernel said:

    ib1: mtu > 2044 will cause multicast packet drops.

I probably don't care for multicast, as long as NFS works...

>> 3. Currently, I have only one cable connecting the ib1 on one machine
>>    to ib1 of another. Would I get double the throughput if I connect
>>    the two other ports together as well and bundle the connections? If
>>    yes, should I bundle them as network-interfaces -- using lagg(4) --
>>    or is there something Infiniband-specific?
> Good question.  With Mellanox 6036 switches, nothing needs to be 
> configured to benefit from multiple links.  We ran 6 from each of two 
> top-level switches to each of 6 leaf switches.  The switches recognize 
> the fabric topology automatically.  I don't know if the same is true 
> with the HCAs.  You could try just adding a cable and compare results 
> from iperf, etc.

Sorry, I don't understand, how that would "just work" -- if both 
interfaces (ib1 and ib0) are configured separately, with different 
IP-addresses, etc?

    -mi



More information about the freebsd-infiniband mailing list