how to use nvshmem in my program by cmake?

137 Views Asked by At

I need use nvshmem in a program.I installed nvshmem in advance according to the method on the official website. according to https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/nvshmem-install-proc.html, when installing nvshmem, I configured GDRCOPY, Open MPI, NCCL. When I run this program on Docker, it runs perfectly. But I hope it can run on the host. When I compiled on the host, an error was reported:

/usr/bin/ld: /usr/local/nvshmem/lib/libnvshmem.a(p2p.o): in function `nvshmemt_p2p_can_reach_peer(int*, nvshmem_transport_pe_info*, nvshmem_transport*)':
p2p.cpp:(.text+0x203): undefined reference to `nvmlDeviceGetHandleByPciBusId_v2'
/usr/bin/ld: p2p.cpp:(.text+0x219): undefined reference to `nvmlDeviceGetHandleByPciBusId_v2'
/usr/bin/ld: p2p.cpp:(.text+0x23a): undefined reference to `nvmlDeviceGetP2PStatus'
/usr/bin/ld: p2p.cpp:(.text+0x268): undefined reference to `nvmlDeviceGetP2PStatus'
/usr/bin/ld: p2p.cpp:(.text+0x296): undefined reference to `nvmlDeviceGetP2PStatus'
/usr/bin/ld: /usr/local/nvshmem/lib/libnvshmem.a(p2p.o): in function `nvshmemt_p2p_finalize(nvshmem_transport*)':
p2p.cpp:(.text+0x6d5): undefined reference to `nvmlShutdown'
/usr/bin/ld: /usr/local/nvshmem/lib/libnvshmem.a(p2p.o): in function `nvshmemt_p2p_init(nvshmem_transport**)':
p2p.cpp:(.text+0xdb9): undefined reference to `nvmlInit_v2'
collect2: error: ld returned 1 exit status

My cmake file is similar

target_compile_options(test PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:
                        -Xcompiler 
                        -pthread 
                        -rdc=true 
                        -ccbin g++ 
                        -arch ${SM_ARCH}
                       >)

set_target_properties(test PROPERTIES CUDA_SEPARABLE_COMPILATION ON)
set_target_properties(test PROPERTIES CUDA_ARCHITECTURES "80")

target_include_directories(test
    PRIVATE 
    ${NVSHMEM_HOME}/include 
    ${CUDA_HOME}/include 
    ${MPI_HOME}/include
    ${CUDNN_HOME}/include
    include
)
target_link_libraries(test
    nvshmem 
    cuda
    mpi_cxx 
    mpi 
    cublas 
    cudnn 
    gomp 
    curand
)

Do I need to reinstall nvshmem? Or are some variables not set? Is there a complete tutorial on using nvshmem in cmake?

My system version is Ubuntu 20.04 and I have two Ampere architecture graphics cards without NVlinks. It seems that libnvshmem. a cannot find the nvml file.I am sure NVML is already installed on my machine and libnvidia-ml.so can be found at/usr/local/cuda/lib64/stubs. Here are my drivers and cuda versions.

 NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7  

I have written a test file to ensure that NVML can function properly.

But when I add target_link_libraries(test /usr/lib/x86_64-linux-gnu/libnvidia-ml.so) It does not work.I am not sure how to write nvshmem testing, so it is difficult to know if I have installed it correctly。

0

There are 0 best solutions below