Thank you. Worked for us
without unsetting LD_LIBRARY_PATH
Thank you. Worked for us
without unsetting LD_LIBRARY_PATH
This worked for us as well! For reference it’s line 457 of cryosparcw
.
Glad I was on the right path, at least.
All my poking around with quirky CUDA installs appears to have broken my worker install in a rather terminal fashion. Oh well, the database is safe, time for a reinstall.
Then maybe I can try 3D flex out.
edit: well, it installed OK with a non-broken conda install…
Successfully built pycuda
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
3D Flex Refine dependencies installed successfully.
Now to test it out.
edit 2:
All tests fail. Suspect due to --nossd flag used during install.
However, running jobs manually (full T20S workflow with every job type enabled, disabling SSD caching where necessary, and then running 3D flex (still running training) all run fine.
Now it’s fixed. It was the conflict of system’s preload cuda 11-2.
So yes, specifying cryosparcw with the cuda 11.7 conda update does work!!
Thank you all
------------------------ updated -------
Are you guys able to run through 3D Flex Training after specifying 11.7 update in cryosparc worker?
I still got the following error though… I guess unsetting LD_LIBRARY is necessary for the dependencies update though.
[2022-12-16 17:40:16.44]
[CPU: 203.7 MB]
Traceback (most recent call last):
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/…/…/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 80, in cryosparc_compute.run.main
File “/apps/cryosparc4/cryosparc_worker/cryosparc_compute/jobs/jobregister.py”, line 443, in get_run_function
runmod = importlib.import_module("…"+modname, name)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1006, in _gcd_import
File “”, line 983, in _find_and_load
File “”, line 967, in _find_and_load_unlocked
File “”, line 677, in _load_unlocked
File “”, line 1050, in exec_module
File “”, line 219, in _call_with_frames_removed
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py”, line 12, in init cryosparc_compute.jobs.flex_refine.run_train
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py”, line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 217, in
_load_global_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 178, in _load_global_deps
_preload_cuda_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11
While @scaiola fix work for the update seems like flex train end with error:
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 80, in cryosparc_compute.run.main
File "/opt/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 443, in get_run_function
runmod = importlib.import_module(".."+modname, __name__)
File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 1050, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train
File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 191, in <module>
_load_global_deps()
File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 153, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
any thoughts?
Hi,
As we are working to address the issues you have been experiencing with
cryosparcw install-3dflex
we are looking for volunteers who would be willing to test our fix.
If you are confident in installing a CryoSPARC test version and, potentially, recovering from a broken installation, please send me a direct message with the following information for your CryoSPARC worker:
uname -a
CRYOSPARC_CUDA_PATH
inside cryosparc_worker/config.sh
$CRYOSPARC_CUDA_PATH/bin/nvcc --version
(after setting the CRYOSPARC_CUDA_PATH
environment variable)nvidia-smi
This change works for us too!
After the failing install-3dflex installation message, you need to run ./bin/cryosparcw forcedeps to roll back; otherwise it will still give same error.
Ubuntu 18.04.1
Cuda 11.8
@Bassem @qitsweauca I got the same error as you. The simple fix is to connect the libraries as follows
export LD_LIBRARY_PATH=/home/xxxxx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/…/…/nvidia/cublas/lib/
There could be a more sophisticated solution. But this worked for me
I got the same error as @Bassem and @qitsweauca running training job after install 3d-flex. Can you elaborate on your fix? What I did is
export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/
and then run the training job, still got the same error.
Shall I add the “export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/” in bashrc file or before running ./bin/cryosparcw install-3dflex?
Also if it’s relevant I’m using CentOS and the original cryosparc was installed using cuda 11.1. Shall I reinstall cryosparc using cuda 11.7?
I carried out the update, and then ran the install 3dflex.
everything seems working but i got this message at end of install 3dflex
Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.
not sure what the last line is impacting if others jobs so far running ok?
I updated from v4.0.3 to v4.1.1 then run ./bin/cryosparcw install-3dflex and got exactly the same error as the above. Anybody has suggestions? Thanks.
Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.
We had the same issue on our CentOS machine. It turned out cuda was present in our PATH & LD_LIBRARY_PATH (even though we had removed it from our .bashrc), and I think that was somehow causing the issue. you can check by running export | grep cuda
Yes, that’s what we did
It is okay to ignore this error message in this context.
HI Olibclarke,i already update cryosparc 4.11 before install 3dFlex,but install error ! why? please Thank you
bpl-subset/bpl_subset/boost/system/error_code.hpp: At global scope:
bpl-subset/bpl_subset/boost/system/error_code.hpp:214:36: warning: ‘pycudaboost::system::posix_category’ defined but not used [-Wunused-variable]
static const error_category & posix_category = generic_category();
^
bpl-subset/bpl_subset/boost/system/error_code.hpp:215:36: warning: ‘pycudaboost::system::errno_ecat’ defined but not used [-Wunused-variable]
static const error_category & errno_ecat = generic_category();
^
bpl-subset/bpl_subset/boost/system/error_code.hpp:216:36: warning: ‘pycudaboost::system::native_ecat’ defined but not used [-Wunused-variable]
static const error_category & native_ecat = system_category();
^
error: command ‘gcc’ failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /work/home/faculty/yhzhang/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/python3.7 -u -c ‘import io, os, sys, setuptools, tokenize; sys.argv[0] = ‘"’"’/tmp/pip-req-build-yet46pjv/setup.py’"’"’; file=’"’"’/tmp/pip-req-build-yet46pjv/setup.py’"’"’;f = getattr(tokenize, ‘"’"‘open’"’"’, open)(file) if os.path.exists(file) else io.StringIO(’"’"‘from setuptools import setup; setup()’"’"’);code = f.read().replace(’"’"’\r\n’"’"’, ‘"’"’\n’"’"’);f.close();exec(compile(code, file, ‘"’"‘exec’"’"’))’ install --record /tmp/pip-record-dvkpa246/install-record.txt --single-version-externally-managed --compile --install-headers /work/home/faculty/yhzhang/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m/pycuda Check the logs for full command output.
Welcome to the forum @alexlu.
Please can you provide additional information:
uname -a
gcc -v
/usr/bin/env
You may also try: