And, yes, this is bizzare.
I've been having trouble running openmpi-4.0.2 on RH 7.6. Everything segfaults, even a non-MPI shell script. If I run as another user, everything runs as expected - so there's something user-dependent. And I have been dancing my environment and the other user's environment around looking for what's toxic.
After stripping things down to bare metal, I found that the underlying ssh command sets the environment variable LD_LIBRARY_PATH to a string ending in ":", and this is toxic in my user. It is not toxic for another user.
This happens for RH 6.1, 7.4 and 7.6. I have a RH 7.1 system on which this DOES NOT happen (the bizzareness keeps giving and giving).
I need one of two things to resolve this problem: either a way to populate LD_LIBRARY_PATH in the underlying ssh command from the "mpiexec" command (all my attempts have failed to this point), or find the root cause in my user's environment and fix it.
The following is from RH 6.1, which gives an actual error message rather than just a segfault.
-bash-4.1$ /usr/bin/ssh -x $(hostname) 'LD_LIBRARY_PATH=barney:fred ; export LD_LIBRARY_PATH ; echo done ; env | egrep LD_'
done
LD_LIBRARY_PATH=barney:fred
-bash-4.1$ /usr/bin/ssh -x $(hostname) 'LD_LIBRARY_PATH=barney: ; export LD_LIBRARY_PATH ; echo done ; env | egrep LD_'
done
env: relocation error: libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
egrep: relocation error: libc.so.6: symbol _dl_starting_up, version GLIBC_PRIVATE not defined in file ld-linux-x86-64.so.2 with link time reference
-bash-4.1$ /usr/bin/ssh -x $(hostname) 'LD_LIBRARY_PATH=barney ; export LD_LIBRARY_PATH ; echo done ; env | egrep LD_'
done
LD_LIBRARY_PATH=barney
-bash-4.1$ sudo su - otheruser
-bash-4.1$ /usr/bin/ssh -x $(hostname) 'LD_LIBRARY_PATH=barney:fred ; export LD_LIBRARY_PATH ; echo done ; env | egrep LD_'
done
LD_LIBRARY_PATH=barney:fred
-bash-4.1$ /usr/bin/ssh -x $(hostname) 'LD_LIBRARY_PATH=barney: ; export LD_LIBRARY_PATH ; echo done ; env | egrep LD_'
done
LD_LIBRARY_PATH=barney:
-bash-4.1$ /usr/bin/ssh -x $(hostname) 'LD_LIBRARY_PATH=barney ; export LD_LIBRARY_PATH ; echo done ; env | egrep LD_'
done
LD_LIBRARY_PATH=barney
I've only been dancing with this for three days. (Gotta get openmpi working for my user.)
From this post: https://comp.os.linux.development.apps.narkive.com/P7hTPwxP/must-ld-library-path-end-with-a I found that the terminal ":" will add your $HOME to the end of the LD_LIBRARY_PATH. This is not something I knew (and I have 35 years of professional experience with *NIX systems).
I have discovered a copy of libc.so.6 in my home directory which I put there some four months ago for some (probably valid at the time) reason, or just because I was clumsy that day. Removed the libc.so.6, and the problem went away.
Two solutions: don't suffix LD_LIBRARY_PATH, and NEVER PUT ANY SHARED LIBRARIES IN YOUR HOME DIRECTORY EVERY.
Four days I'll never get back.
Experience is what you get when you didn't get what you wanted.
Conservation of Embarrassment: You won't find the answer to your stupid question yourself until you post it on stackoverflow.com.