Why MPI_REDUCE shows different number at some array locations?

57 Views Asked by At
program main_mpi_test

    use mpi
    implicit none
    integer(kind=8) :: n                  
    integer(kind=8) :: max_optical_depth  
    integer(kind=8) :: bin               
    integer(kind=8) :: tmax               
    integer(kind=8) :: sum_scatt          
    integer(kind=8) :: i,j,k
    real(kind=8)    :: rad                

    real(kind=8), dimension(:), allocatable :: send_results, recv_results
    integer :: nrank, nproc,ierr,root

    call MPI_INIT(ierr)
    call MPI_COMM_RANK(MPI_COMM_WORLD, nrank, ierr)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
    call MPI_BARRIER(MPI_COMM_WORLD, ierr)
    root = 0

    max_optical_depth = 10
    bin = 10
    tmax = max_optical_depth*bin

    allocate(send_results(tmax))
    allocate(recv_results(tmax))
    
    do i = nrank+1, tmax, nproc
        rad = real(i)/real(bin)
        send_results(i) = rad
        print*,'send', rad, send_results(i)
    end do

    call MPI_BARRIER(MPI_COMM_WORLD, ierr)
    call MPI_REDUCE(send_results, recv_results, tmax, MPI_DOUBLE_PRECISION, &
                    MPI_SUM, root, MPI_COMM_WORLD, ierr)

    if (nrank ==0) then
        do i = 1, tmax
            rad = real(i)/real(bin)
            print*,'recv',rad, recv_results(i)
        end do
    end if

    call MPI_FINALIZE(ierr)
    deallocate(send_results)
    deallocate(recv_results)

end program main_mpi_test

This is my code. I use MPI_REDUCE. And I use mpiifort compiler.

And the problem is when I print recv_results, the value of rad and recv_results must be equal, but it shows Nan or a large number about 1e186 at a some locations in the array. In other locations, rad and recv_results have the same value.

How can I solve this problem?

print rad, recv_results

1

There are 1 best solutions below

0
Ian Bush On BEST ANSWER

Your program has quite a lot of problems.

  1. As noted in the comments use of real(8) and integer(8) is bad practice. They are not portable, may not do what you expect, and may not even be supported by a compiler. Learn about Fortran kinds from your favourite Fortran book or tutorial, and the links Vladimir provides (Fortran: integer*4 vs integer(4) vs integer(kind=4) and Fortran 90 kind parameter) are useful supporting material. My preferred way would be to use the iso_fortran_env standard module, see my program below
  2. Whatever integer(8) does (if it is supported) it is unlikely to be compatible with the MPI library. MPI is defined (for most routines) in terms of default Fortran integers, so I strongly suggest you use default Fortran integers
  3. You don't initialize all of the array send_results. Thus any results are possible as you observe. Uninitialized things can take any value, only by luck will they be zero.
  4. The Real function with only one argument converts that argument to a value of default real kind, which is not what you want here. This is closely related to 1) above, but to get the kind you want you need to supply that kind. See my program below
  5. While not a bug use of MPI_Barrier in the code below is completely unnecessary and just wastes times - MPI_Reduce is already blocking. As a rule of thumb you never need an mpi_barrier except for measuring timing, debugging, and possibly when using shared memory

( For the purists I should also probably call MPI_Type_create_f90_real but I'll be lazy and assume MPI_DOUBLE_PRECISION maps to real64 )

Anyway, here's my program, and it working with gfortran, ifort and ifx

ijb@ijb-Latitude-5410:~/work/stack$ cat red.f90
Program main_mpi_test

  Use iso_fortran_env, Only : wp => real64

  Use mpi
  
  Implicit None

  Real( wp ), Dimension(:), Allocatable :: send_results, recv_results

  Real( wp )    :: rad                

  Integer :: n                  
  Integer :: max_optical_depth  
  Integer :: bin               
  Integer :: tmax               
  Integer :: sum_scatt          
  Integer :: i,j,k
  Integer :: nrank, nproc, ierr, root

  Logical :: worked
  
  Call MPI_INIT(ierr)
  Call MPI_COMM_RANK(MPI_COMM_WORLD, nrank, ierr)
  Call MPI_COMM_SIZE(MPI_COMM_WORLD, nproc, ierr)
  Call MPI_BARRIER(MPI_COMM_WORLD, ierr)
  root = 0
  
  max_optical_depth = 10
  bin = 10
  tmax = max_optical_depth*bin
  
  Allocate(send_results(tmax))
  Allocate(recv_results(tmax))

  send_results = 0.0_wp
  Do i = nrank+1, tmax, nproc
     rad = Real( i, Kind( rad ) ) / Real( bin, Kind( rad ) )
     send_results(i) = rad
!     Print*,'send', rad, send_results(i)
  End Do
  
  Call MPI_REDUCE(send_results, recv_results, tmax, MPI_DOUBLE_PRECISION, &
       MPI_SUM, root, MPI_COMM_WORLD, ierr)
  
  If (nrank ==0 ) Then
     worked = .true.
     Do i = 1, tmax
        rad = Real( i, Kind( rad ) ) / Real( bin, Kind( rad ) )
!        Write( *, * ) 'recv', rad, recv_results(i)
        worked = worked .And. ( Abs( rad - recv_results(i) ) < 1e-15_wp  )
     End Do
     Write( *, * ) "Worked? ", worked
  End If
  
  Deallocate(send_results)
  Deallocate(recv_results)
  
  Call MPI_FINALIZE(ierr)

End Program main_mpi_test
  
ijb@ijb-Latitude-5410:~/work/stack$ mpif90 -Wall -Wextra -fcheck=all -O -g red.f90
red.f90:18:16:

   18 |   Integer :: i,j,k
      |                1
Warning: Unused variable ‘j’ declared at (1) [-Wunused-variable]
red.f90:18:18:

   18 |   Integer :: i,j,k
      |                  1
Warning: Unused variable ‘k’ declared at (1) [-Wunused-variable]
red.f90:13:14:

   13 |   Integer :: n
      |              1
Warning: Unused variable ‘n’ declared at (1) [-Wunused-variable]
red.f90:17:22:

   17 |   Integer :: sum_scatt
      |                      1
Warning: Unused variable ‘sum_scatt’ declared at (1) [-Wunused-variable]
ijb@ijb-Latitude-5410:~/work/stack$ mpirun -np 4 ./a.out
 Worked?  T
ijb@ijb-Latitude-5410:~/work/stack$ mpiifort red.f90
ijb@ijb-Latitude-5410:~/work/stack$ mpirun -np 4 ./a.out
 Worked?  T
ijb@ijb-Latitude-5410:~/work/stack$ mpiifort -fc=ifx red.f90
ijb@ijb-Latitude-5410:~/work/stack$ mpirun -np 4 ./a.out
 Worked?  T
ijb@ijb-Latitude-5410:~/work/stack$