zheev outputs incorrect and random results when called from within dynamic library

75 Views Asked by At

I have DFT(density functional theory) code written in Fortran. Inside this program I use zheev function from lapack to diagonalize overlap matrix (S). I compile the code in two different targets (a) standalone program fireball.x and (b) dynamic library (libFireCore.so), without any differences in the code.

When I run the program fireball.x everything works fine. When I call libFireCore.so from inside other program (written in C++) it produces incorect output which is non-deterministic - i.e. every time I run it I got different results.

I tracked down the problem, and it turns out the problem is in some zheev call where I diagonalize overlap matrix.

When I print the overlap matrix (S) which goes into zheev it is exactly the same (in fireball.x and libFireCore.so). But the output matrix (eigenvectors, zzzz) differ at the end. (see attached images)

The S matrix which goes in is the same: enter image description here

The eigenvectros (zzzz) which goes out differs at the end: enter image description here

At the same time the eigenvaues are exactly the same (and the values are not crazy small or crazy large): S-eigenvalues: 0.20320429171825077 0.23871180664043531 0.24800955043297762 0.25981763454707818 0.26625761411873500 0.29656130959541332 0.30406932675113513 0.31931361483751519 0.34177021310844230 0.36133697610296117 0.38357717267479940 0.57800206110211672 0.73150940647530815 0.73986354541136745 0.87968589806449793 1.0629166543759208 1.1745454410259122 1.1914134775739986 1.2071693499296059 1.2928315521982148 1.3497098021371781 1.4298908216838513 1.5846592240498847 1.7381615303550286 1.9139531841288968 1.9292327889682286 2.1528696122119046 2.2187686011872194 2.6022345485931204

How is this possible? It seems like some memory corruption inside zheev or how else I can explain the randomness of output (non-deterministic)?

I checked using MKL_VERBOS=1 that the version of MKL library is the same as well as the dimensions of all arrays (including the temporary work-arrays) is the same. So what can be wrong?

    integer lwork, lrwork, liwork
    complex, allocatable, dimension (:) :: work
    real,    allocatable, dimension (:) :: rwork
    real,    allocatable, dimension (:) :: slam
    complex, allocatable, dimension (:, :)    :: xxxx, zzzz

    allocate ( slam(norbitals) )
    allocate ( xxxx(norbitals,norbitals) )
    allocate ( zzzz(norbitals,norbitals) )

    lwork  = 1
    lrwork = 3*norbitals - 2
    allocate (work(lwork))
    allocate (rwork(lrwork))

    write (*,*) "!!!! DEBUG sqrtS debug_writeMatFile(Sk_sqrtS.log) norbitals=",norbitals, ' lwork = ',lwork, ' lrwork = ',lrwork, ' liwork = ',liwork, ' divide = ',divide
    call debug_writeMatFile_cmp( "Sk_sqrtS_", zzzz, norbitals, norbitals, 0 )

        slam(:)  = 0.0d0
        work (:) = 0.0d0
        rwork(:) = 0.0d0

! first find optimal working space size (lwork)
        call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work,   -1, rwork, info)
! resize working space
        lwork = work(1)    !  workspace query see: https://www.netlib.org/lapack/explore-html/df/d9a/group__complex16_h_eeigen_gaf23fb5b3ae38072ef4890ba43d5cfea2.html

        deallocate (work)
        allocate (work(lwork))
! diagonalize the overlap matrix with the new working space

        slam(:)  = 0.0d0
        work (:) = 0.0d0
        rwork(:) = 0.0d0
        call zheev ('V', 'U', norbitals, zzzz, norbitals, slam, work,  lwork, rwork , info)

    if (info .ne. 0) call diag_error (info, 0)

    write (*,*) "!!!! DEBUG sqrtS() debug_writeMatFile(zzzz_pre.log) norbitals=",norbitals, " norbitals_new= ", norbitals_new, "info ", info
    call debug_writeMatFile_cmp( "zzzz_pre1_", zzzz, norbitals, norbitals, 0 )  

EDIT

I found the problem/solution. It is about OpenMP. When I switch OpenMP off in compiler, or if I set export OMP_NUM_THREADS=1 it works just fine.

It seems similar to this problem: https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Incorrect-eigenvectors-from-ZHEEV-in-MKL-using-multi-threading/td-p/836860

Still it would be nice if somebody can give me some advice why OpenMP cause problem here, and how to solve it. I would like to use OpenMP in my application.

0

There are 0 best solutions below