1、当我尝试使用“/opt/app/singularity/bin/singularity exec --nv adapt_latest.sif nvidia-smi”命令时,会显示:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
2、于是我尝试使用“sudo find / -name libnvidia-ml.so”寻找这一文件,系统返回如下:
/opt/app/cuda/12.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/opt/app/cuda/11.2/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/opt/app/cuda/11.7/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/opt/app/cuda/11.8/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
/opt/app/nvidia/525.60.13/lib/libnvidia-ml.so
/opt/app/nvidia/525.60.13/lib32/libnvidia-ml.so
/opt/app/nvidia/535.154.05/lib/libnvidia-ml.so
/opt/app/nvidia/535.154.05/lib32/libnvidia-ml.so
/opt/app/nvidia/460.91.03/lib/libnvidia-ml.so
/opt/app/nvidia/460.91.03/lib32/libnvidia-ml.so
/opt/app/nvidia/450.216.04/lib/libnvidia-ml.so
3、于是我又尝试使用如下命令:
3.1:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/app/cuda/11.7/targets/x86_64-linux/lib/stubs
/opt/app/singularity/bin/singularity exec --nv --env LD_LIBRARY_PATH=/opt/app/cuda/11.7/targets/x86_64-linux/lib/stubs adapt_latest.sif nvidia-smi
3.2:export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/app/nvidia/535.154.05/lib/
/opt/app/singularity/bin/singularity exec --nv --env LD_LIBRARY_PATH=/opt/app/nvidia/535.154.05/lib/libnvidia-ml.so adapt_latest.sif nvidia-smi
但是都无济于事,系统还是返回如下:
NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system.
Please also try adding directory that contains libnvidia-ml.so to your system PATH.
请问我该如何解决这个问题?
备注:
我尝试了两个不同的singularity容器,都在使用--nv参数时遇到了这个问题。
我使用的是集群,给我分配了gpu2。
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.154.05 Driver Version: 535.154.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A800-SXM4-80GB On | 00000000:42:00.0 Off | 0 |
| N/A 37C P0 62W / 500W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
singularity容器中的nvcc -V为:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0