Hi, a standalone cryosparc with fresh 3dflex installation failed with error:“ImportError: libnccl.so.2: cannot open shared object file: No such file or directory.”
I deleted the cryosparc_worker and reinstall it, but still dosen’t work. Also , I don’t find libnssl.so.2 in cryosparc_worker path.
What do you suggest ? Do I need install nccl into the os ?
1 Like
It would be a good thing to check.
sudo apt install libnccl2
should do the trick.
@wsatbluesky Please can you post
- the text of this Traceback to make your interesting question easier to find by future visitors of the forum
- CryoSPARC version and patch level
- the output (as text) of these commands
ldd /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so
cat /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/version.py
@wtempel
The Cryosparc version is 4.2.1,no patch.
Traceback like this:
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 83, in cryosparc_compute.run.main
File "/opt/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 442, in get_run_function
runmod = importlib.import_module(".."+modname, __name__)
File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 1174, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train
File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/__init__.py", line 229, in <module>
from torch._C import * # noqa: F403
ImportError: libnccl.so.2: cannot open shared object file: No such file or directory
Below are output of 2 commands
[user@local cryosparc_worker]$ ldd /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/_C.cpython-38-x86_64-linux-gnu.so
linux-vdso.so.1 => (0x00007ffd0ab95000)
libtorch_python.so => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libtorch_python.so (0x00007f71e50b2000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f71e4e96000)
libc.so.6 => /lib64/libc.so.6 (0x00007f71e4ac8000)
libshm.so => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libshm.so (0x00007f71e6343000)
libtorch.so => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libtorch.so (0x00007f71e631d000)
libnvToolsExt.so.1 => /usr/local/cuda-10.1/targets/x86_64-linux/lib/libnvToolsExt.so.1 (0x00007f71e48bf000)
libtorch_cpu.so => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so (0x00007f71cba0c000)
libtorch_cuda.so => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so (0x00007f71a594b000)
libc10_cuda.so => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libc10_cuda.so (0x00007f71e62ad000)
libc10.so => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libc10.so (0x00007f71e61f2000)
libcudart.so.11.0 => /usr/local/cuda11.1/targets/x86_64-linux/lib/libcudart.so.11.0 (0x00007f71a56c6000)
libcudnn.so.8 => not found
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f71a53be000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f71a51a8000)
/lib64/ld-linux-x86-64.so.2 (0x00007f71e6167000)
librt.so.1 => /lib64/librt.so.1 (0x00007f71a4fa0000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f71a4d9c000)
libm.so.6 => /lib64/libm.so.6 (0x00007f71a4a9a000)
libgomp-a34b3233.so.1 => /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/lib/libgomp-a34b3233.so.1 (0x00007f71a4870000)
libcupti.so.11.7 => not found
libcusparse.so.11 => /usr/local/cuda11.1/targets/x86_64-linux/lib/libcusparse.so.11 (0x00007f7196637000)
libcurand.so.10 => /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcurand.so.10 (0x00007f71925d6000)
libcudnn.so.8 => not found
libnccl.so.2 => not found
libcufft.so.10 => /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcufft.so.10 (0x00007f7189f9c000)
libcublas.so.11 => /usr/local/cuda11.1/targets/x86_64-linux/lib/libcublas.so.11 (0x00007f7181b80000)
libcublasLt.so.11 => /usr/local/cuda11.1/targets/x86_64-linux/lib/libcublasLt.so.11 (0x00007f7173b8c000)
[user@local cryosparc_worker]$ cat /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/torch/version.py
__version__ = '2.0.0+cu117'
debug = False
cuda = '11.7'
git_version = 'c263bd43e8e8502d4726643bc6fd046f0130ac0e'
hip = None
From the output,like @rbs_sci said, seems like I need install these “not found” libraries into the cuda of OS.
Hm. Maybe not. I’ve checked and I don’t have libnccl installed on any of my recently set up systems but cryoSPARC works without any problems…
@rbs_sci You are right.The other cryosparc I installed works well. Maybe I need to double check system environemnt. The only difference is the torch version, it’s so weired.
Below is the normal cryosparc torch version info.
[cryosparc@login01 cryosparc]$ cat ~/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/
python3.8/site-packages/torch/version.py
__version__ = '1.13.1+cu117'
debug = False
cuda = '11.7'
git_version = '49444c3e546bf240bed24a101e747422d1f8a0ee'
hip = None
I’d try a fresh install into a new directory, making sure that there is nothing CryoSPARC or CUDA related in your environment variables or .bashrc…
You could also try the forcedeps command to reinstall, then install-3dflex…
Sorry for the confusion; I had NCCL installed on the box I was logged in to (and which was the one I checked before replying!) from tinkering with something else.
Thanks @wsatbluesky for posing the outputs in 3dflex job failed with "libnccl.so.2" - #4 by wsatbluesky.
The outputs indicate several potential problems. You may want to initially try the steps in Installing "3dflex" got failed - #5 by wtempel
and see if those steps enable 3D Flex jobs on your instance.
@rbs_sci @wtempel Thanks. Someone set a global wrong $LD_LIBRARY_PATH. I clear it and reinstall 3dflex. It’s ok now.