Why does this sample code (f90, MPI, derived types) causes invalid read/write (valgrind or dmalloc)?

222 Views Asked by At

This is the incriminated code (it is related to another question I asked, here):

program foo

  use mpi

  implicit none

  type double_st
     sequence
     real(kind(0.d0)) :: x,y,z
     integer :: acc
  end type double_st

  integer, parameter :: n=8

  INTEGER :: MPI_CADNA_DST

  integer :: nproc, iprank
  INTEGER :: IERR, STAT(MPI_STATUS_SIZE)
  INTEGER :: MPI_CADNA_DST_TMP
  INTEGER ::&
       COUNT=4,&
       BLOCKLENGTHS(4)=(/1,1,1,1/),&
       TYPES(4)=(/MPI_DOUBLE_PRECISION,&
       MPI_DOUBLE_PRECISION,&
       MPI_DOUBLE_PRECISION,&
       MPI_INTEGER/)
  INTEGER(KIND=MPI_ADDRESS_KIND) :: DISPLS(4), LB, EXTENT
  TYPE(DOUBLE_ST) :: DST
  INTEGER :: I

  type(double_st), allocatable :: bufs(:), bufr(:)

  allocate(bufs(n), bufr(n))

  CALL MPI_INIT(IERR)
  CALL MPI_COMM_SIZE(MPI_COMM_WORLD, NPROC, IERR)
  CALL MPI_COMM_RANK(MPI_COMM_WORLD, IPRANK, IERR)

  CALL MPI_GET_ADDRESS(DST%X,   DISPLS(1))
  CALL MPI_GET_ADDRESS(DST%Y,   DISPLS(2))
  CALL MPI_GET_ADDRESS(DST%Z,   DISPLS(3))
  CALL MPI_GET_ADDRESS(DST%ACC, DISPLS(4))
  DO I=4,1,-1
     DISPLS(I)=DISPLS(I)-DISPLS(1)
  ENDDO
  CALL MPI_TYPE_CREATE_STRUCT(4,BLOCKLENGTHS,DISPLS,TYPES, MPI_CADNA_DST_TMP,IERR)
  CALL MPI_TYPE_COMMIT(MPI_CADNA_DST_TMP,IERR)

  CALL MPI_TYPE_GET_EXTENT(MPI_CADNA_DST_TMP, LB, EXTENT, IERR)
  CALL MPI_TYPE_CREATE_RESIZED(MPI_CADNA_DST_TMP, LB, EXTENT, MPI_CADNA_DST, IERR)
  CALL MPI_TYPE_COMMIT(MPI_CADNA_DST,IERR)

  bufs(:)%x=iprank
  bufs(:)%y=iprank
  bufs(:)%z=iprank
  bufs(:)%acc=iprank
  call mpi_send(bufs(1), n, mpi_cadna_dst, 1-iprank, 0, mpi_comm_world, ierr)
  call mpi_recv(bufr(1), n, mpi_cadna_dst, 1-iprank, 0, mpi_comm_world, stat, ierr)


  deallocate(bufs, bufr)

end program foo

Compiled with intelMPI, version 4.0 or 5.0, I get invalid read/write errors with valgrind or with dmalloc at the send. With openMPI, it is not that clear with that minimum example, but I got similar problems with this communication in the big code from which it is extracted.

Thanks for helping!

2

There are 2 best solutions below

2
Gilles On BEST ANSWER

It looks like the use of sequence is the culprit here. Since your data are not aligned the same way, forcing the linear packing with the sequence keyword generate some unaligned accesses, probably while writing the one of the arrays. Removing it does the trick.

3
Richard Rublev On

I think that he used derived-type definition with sequence(the guy who wrote the code).SEQUENCE cause the components of the derived type to be stored in the same sequence they are listed in the type definition. If SEQUENCE is specified, all derived types specified in component definitions must be sequence types.You should tell us more about compilation,are you on Linux or Windows also.