I need use nvshmem in a program.I installed nvshmem in advance according to the method on the official website. according to https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/nvshmem-install-proc.html, when installing nvshmem, I configured GDRCOPY, Open MPI, NCCL. When I run this program on Docker, it runs perfectly. But I hope it can run on the host. When I compiled on the host, an error was reported:
/usr/bin/ld: /usr/local/nvshmem/lib/libnvshmem.a(p2p.o): in function `nvshmemt_p2p_can_reach_peer(int*, nvshmem_transport_pe_info*, nvshmem_transport*)':
p2p.cpp:(.text+0x203): undefined reference to `nvmlDeviceGetHandleByPciBusId_v2'
/usr/bin/ld: p2p.cpp:(.text+0x219): undefined reference to `nvmlDeviceGetHandleByPciBusId_v2'
/usr/bin/ld: p2p.cpp:(.text+0x23a): undefined reference to `nvmlDeviceGetP2PStatus'
/usr/bin/ld: p2p.cpp:(.text+0x268): undefined reference to `nvmlDeviceGetP2PStatus'
/usr/bin/ld: p2p.cpp:(.text+0x296): undefined reference to `nvmlDeviceGetP2PStatus'
/usr/bin/ld: /usr/local/nvshmem/lib/libnvshmem.a(p2p.o): in function `nvshmemt_p2p_finalize(nvshmem_transport*)':
p2p.cpp:(.text+0x6d5): undefined reference to `nvmlShutdown'
/usr/bin/ld: /usr/local/nvshmem/lib/libnvshmem.a(p2p.o): in function `nvshmemt_p2p_init(nvshmem_transport**)':
p2p.cpp:(.text+0xdb9): undefined reference to `nvmlInit_v2'
collect2: error: ld returned 1 exit status
My cmake file is similar
target_compile_options(test PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:
-Xcompiler
-pthread
-rdc=true
-ccbin g++
-arch ${SM_ARCH}
>)
set_target_properties(test PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_target_properties(test PROPERTIES CUDA_ARCHITECTURES "80")
target_include_directories(test
PRIVATE
${NVSHMEM_HOME}/include
${CUDA_HOME}/include
${MPI_HOME}/include
${CUDNN_HOME}/include
include
)
target_link_libraries(test
nvshmem
cuda
mpi_cxx
mpi
cublas
cudnn
gomp
curand
)
Do I need to reinstall nvshmem? Or are some variables not set? Is there a complete tutorial on using nvshmem in cmake?
My system version is Ubuntu 20.04 and I have two Ampere architecture graphics cards without NVlinks. It seems that libnvshmem. a cannot find the nvml file.I am sure NVML is already installed on my machine and libnvidia-ml.so can be found at/usr/local/cuda/lib64/stubs. Here are my drivers and cuda versions.
NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7
I have written a test file to ensure that NVML can function properly.
But when I add target_link_libraries(test /usr/lib/x86_64-linux-gnu/libnvidia-ml.so) It does not work.I am not sure how to write nvshmem testing, so it is difficult to know if I have installed it correctly。