[Bug 231697] net/openmpi2: MPI_Send to self fails (or receive from self fails?)

bugzilla-noreply at freebsd.org bugzilla-noreply at freebsd.org
Mon Sep 24 21:41:29 UTC 2018


https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231697

            Bug ID: 231697
           Summary: net/openmpi2:  MPI_Send to self fails (or receive from
                    self fails?)
           Product: Ports & Packages
           Version: Latest
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: Individual Port(s)
          Assignee: danilo at FreeBSD.org
          Reporter: russo at bogodyn.org
             Flags: maintainer-feedback?(danilo at FreeBSD.org)
          Assignee: danilo at FreeBSD.org
 Attachment #197466 text/plain
         mime type:

Created attachment 197466
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=197466&action=edit
Simple test case that just does send/receives, fails with OpenMPI 2 or 3.

We have observed our code failing when built with OpenMPI 2.1 or 3.x on FreeBSD
and also on one other Linux platform, and have tracked it down at least on BSD
to a simple test case, in which it is observed that data sent via MPI_Send
calls to the same processor that it's running on are not received by a
corresponding MPI_Irecv, a use case that is supposed to be standard compliant
*AND* which DOES work with OpenMPI 1.10 on the same machine.

My uname -a:
FreeBSD yyy.zzz 10.4-STABLE FreeBSD 10.4-STABLE #0 r327510: Tue Jan  2 21:52:13
MST 2018     xxx at yyy.zzz:/usr/obj/usr/src/sys/GENERIC  amd64

The attached test program will print BAD on each line where it is supposed to
report that proc#N received something from proc#M when M==N, if compiled with
OpenMPI 2.1.x or 3.x.  It will pass just fine with OpenMPI 1.10.

We have run this on a few OSen other than BSD including RHEL6, RHEL7, and OS X,
and none have the same issue.  It does appear, however, that Ubuntu 18.04's
OpenMPI 2.1.x has the same problem.

It is not at all clear where this problem lies, except that the symptom is that
the receive requests do not in fact receive any data if the sender is the same
processor.

To reproduce:
   /usr/local/mpi/openmpi2/bin/mpicc -o testBUG967 testBUG967.c
   /usr/local/mpi/openmpi2/bin/mpirun -np 2 ./testBUG967
On my machine, this gives the output:
0 posting receive 0 0x803fc78b0
0 posting receive 1 0x803fc78b4
0 sending to 0 value 1000
1 posting receive 0 0x803fc78b0
1 posting receive 1 0x803fc78b4
1 sending to 0 value 2000
1 sending to 1 value 2001
0 sending to 1 value 1001
0 wait source 0 count 0 
0 wait source 1 count 4 
0 procs_from 0 vals_from -1000 BAD BAD BAD 
0 procs_from 1 vals_from 2000   
1 wait source 1 count 0 
1 wait source 0 count 4 
1 procs_from 1 vals_from -1000 BAD BAD BAD 
1 procs_from 0 vals_from 1001   

When run instead with openmpi 1 it gives the output actually expected:
> /usr/local/mpi/openmpi/bin/mpicc -o testBUG967 testBUG967.c 
> /usr/local/mpi/openmpi/bin/mpirun -np 2 ./testBUG967 
1 posting receive 0 0x803e23ad8
1 posting receive 1 0x803e23adc
1 sending to 0 value 2000
0 posting receive 0 0x803e23ad8
0 posting receive 1 0x803e23adc
0 sending to 0 value 1000
1 sending to 1 value 2001
0 sending to 1 value 1001
1 wait source 1 count 4 
1 wait source 0 count 4 
1 procs_from 1 vals_from 2001   
0 wait source 0 count 4 
0 wait source 1 count 4 
0 procs_from 0 vals_from 1000   
1 procs_from 0 vals_from 1001   
0 procs_from 1 vals_from 2000   

I have tried it with varying --mca btl options (tcp,self; sm,self; vader,self)
as well, and it always gets the failed receive issue with all of them unless I
use OpenMPI 1.x.


Additional information:
> pkg info openmpi2
openmpi2-2.1.5
Name           : openmpi2
Version        : 2.1.5
Installed on   : Mon Sep 24 15:31:19 2018 MDT
Origin         : net/openmpi2
Architecture   : FreeBSD:10:amd64
Prefix         : /usr/local
Categories     : net parallel
Licenses       : BSD3CLAUSE
Maintainer     : danilo at FreeBSD.org
WWW            : http://www.open-mpi.org/
Comment        : High Performance Message Passing Library
Options        :
        DEBUG          : on
        IPV6           : on
        SLURM          : off
        TORQUE         : off
Shared Libs required:
        libhwloc.so.5
        libevent-2.1.so.6
        libevent_pthreads-2.1.so.6
        libquadmath.so.0
        libgcc_s.so.1
        libgfortran.so.4
        libmunge.so.2

> pkg info openmpi
openmpi-1.10.7_3
Name           : openmpi
Version        : 1.10.7_3
Installed on   : Wed Aug 22 23:44:37 2018 MDT
Origin         : net/openmpi
Architecture   : FreeBSD:10:amd64
Prefix         : /usr/local
Categories     : net parallel
Licenses       : BSD3CLAUSE
Maintainer     : danilo at FreeBSD.org
WWW            : http://www.open-mpi.org/
Comment        : High Performance Message Passing Library
Options        :
        IPV6           : on
        SLURM          : off
        TORQUE         : off
        VT             : off
Shared Libs required:
        libquadmath.so.0
        libevent_pthreads-2.1.so.6
        libevent-2.1.so.6
        libhwloc.so.5
        libgfortran.so.4
        libgcc_s.so.1

> pkg info hwloc
hwloc-1.11.11
Name           : hwloc
Version        : 1.11.11
Installed on   : Wed Sep 19 08:08:13 2018 MDT
Origin         : devel/hwloc
Architecture   : FreeBSD:10:amd64
Prefix         : /usr/local
Categories     : devel
Licenses       : BSD3CLAUSE
Maintainer     : phd_kimberlite at yahoo.co.jp
WWW            : http://www.open-mpi.org/projects/hwloc/
Comment        : Portable Hardware Locality software package
Options        :
        CAIRO          : off
        DOCS           : on
Shared Libs required:
        libxml2.so.2
        libpciaccess.so.0

-- 
You are receiving this mail because:
You are the assignee for the bug.


More information about the freebsd-ports-bugs mailing list