[Bug 231697] net/openmpi2: MPI_Send to self fails (or receive from self fails?)
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Mon Sep 24 21:41:29 UTC 2018
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231697
Bug ID: 231697
Summary: net/openmpi2: MPI_Send to self fails (or receive from
self fails?)
Product: Ports & Packages
Version: Latest
Hardware: amd64
OS: Any
Status: New
Severity: Affects Only Me
Priority: ---
Component: Individual Port(s)
Assignee: danilo at FreeBSD.org
Reporter: russo at bogodyn.org
Flags: maintainer-feedback?(danilo at FreeBSD.org)
Assignee: danilo at FreeBSD.org
Attachment #197466 text/plain
mime type:
Created attachment 197466
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=197466&action=edit
Simple test case that just does send/receives, fails with OpenMPI 2 or 3.
We have observed our code failing when built with OpenMPI 2.1 or 3.x on FreeBSD
and also on one other Linux platform, and have tracked it down at least on BSD
to a simple test case, in which it is observed that data sent via MPI_Send
calls to the same processor that it's running on are not received by a
corresponding MPI_Irecv, a use case that is supposed to be standard compliant
*AND* which DOES work with OpenMPI 1.10 on the same machine.
My uname -a:
FreeBSD yyy.zzz 10.4-STABLE FreeBSD 10.4-STABLE #0 r327510: Tue Jan 2 21:52:13
MST 2018 xxx at yyy.zzz:/usr/obj/usr/src/sys/GENERIC amd64
The attached test program will print BAD on each line where it is supposed to
report that proc#N received something from proc#M when M==N, if compiled with
OpenMPI 2.1.x or 3.x. It will pass just fine with OpenMPI 1.10.
We have run this on a few OSen other than BSD including RHEL6, RHEL7, and OS X,
and none have the same issue. It does appear, however, that Ubuntu 18.04's
OpenMPI 2.1.x has the same problem.
It is not at all clear where this problem lies, except that the symptom is that
the receive requests do not in fact receive any data if the sender is the same
processor.
To reproduce:
/usr/local/mpi/openmpi2/bin/mpicc -o testBUG967 testBUG967.c
/usr/local/mpi/openmpi2/bin/mpirun -np 2 ./testBUG967
On my machine, this gives the output:
0 posting receive 0 0x803fc78b0
0 posting receive 1 0x803fc78b4
0 sending to 0 value 1000
1 posting receive 0 0x803fc78b0
1 posting receive 1 0x803fc78b4
1 sending to 0 value 2000
1 sending to 1 value 2001
0 sending to 1 value 1001
0 wait source 0 count 0
0 wait source 1 count 4
0 procs_from 0 vals_from -1000 BAD BAD BAD
0 procs_from 1 vals_from 2000
1 wait source 1 count 0
1 wait source 0 count 4
1 procs_from 1 vals_from -1000 BAD BAD BAD
1 procs_from 0 vals_from 1001
When run instead with openmpi 1 it gives the output actually expected:
> /usr/local/mpi/openmpi/bin/mpicc -o testBUG967 testBUG967.c
> /usr/local/mpi/openmpi/bin/mpirun -np 2 ./testBUG967
1 posting receive 0 0x803e23ad8
1 posting receive 1 0x803e23adc
1 sending to 0 value 2000
0 posting receive 0 0x803e23ad8
0 posting receive 1 0x803e23adc
0 sending to 0 value 1000
1 sending to 1 value 2001
0 sending to 1 value 1001
1 wait source 1 count 4
1 wait source 0 count 4
1 procs_from 1 vals_from 2001
0 wait source 0 count 4
0 wait source 1 count 4
0 procs_from 0 vals_from 1000
1 procs_from 0 vals_from 1001
0 procs_from 1 vals_from 2000
I have tried it with varying --mca btl options (tcp,self; sm,self; vader,self)
as well, and it always gets the failed receive issue with all of them unless I
use OpenMPI 1.x.
Additional information:
> pkg info openmpi2
openmpi2-2.1.5
Name : openmpi2
Version : 2.1.5
Installed on : Mon Sep 24 15:31:19 2018 MDT
Origin : net/openmpi2
Architecture : FreeBSD:10:amd64
Prefix : /usr/local
Categories : net parallel
Licenses : BSD3CLAUSE
Maintainer : danilo at FreeBSD.org
WWW : http://www.open-mpi.org/
Comment : High Performance Message Passing Library
Options :
DEBUG : on
IPV6 : on
SLURM : off
TORQUE : off
Shared Libs required:
libhwloc.so.5
libevent-2.1.so.6
libevent_pthreads-2.1.so.6
libquadmath.so.0
libgcc_s.so.1
libgfortran.so.4
libmunge.so.2
> pkg info openmpi
openmpi-1.10.7_3
Name : openmpi
Version : 1.10.7_3
Installed on : Wed Aug 22 23:44:37 2018 MDT
Origin : net/openmpi
Architecture : FreeBSD:10:amd64
Prefix : /usr/local
Categories : net parallel
Licenses : BSD3CLAUSE
Maintainer : danilo at FreeBSD.org
WWW : http://www.open-mpi.org/
Comment : High Performance Message Passing Library
Options :
IPV6 : on
SLURM : off
TORQUE : off
VT : off
Shared Libs required:
libquadmath.so.0
libevent_pthreads-2.1.so.6
libevent-2.1.so.6
libhwloc.so.5
libgfortran.so.4
libgcc_s.so.1
> pkg info hwloc
hwloc-1.11.11
Name : hwloc
Version : 1.11.11
Installed on : Wed Sep 19 08:08:13 2018 MDT
Origin : devel/hwloc
Architecture : FreeBSD:10:amd64
Prefix : /usr/local
Categories : devel
Licenses : BSD3CLAUSE
Maintainer : phd_kimberlite at yahoo.co.jp
WWW : http://www.open-mpi.org/projects/hwloc/
Comment : Portable Hardware Locality software package
Options :
CAIRO : off
DOCS : on
Shared Libs required:
libxml2.so.2
libpciaccess.so.0
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-ports-bugs
mailing list