3DFlex Dependencies; Building pycuda

Glad I was on the right path, at least. :smile:

All my poking around with quirky CUDA installs appears to have broken my worker install in a rather terminal fashion. Oh well, the database is safe, time for a reinstall.

Then maybe I can try 3D flex out. :smiley:

edit: well, it installed OK with a non-broken conda install…

Successfully built pycuda
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
3D Flex Refine dependencies installed successfully.

Now to test it out.

edit 2:

All tests fail. Suspect due to --nossd flag used during install.

However, running jobs manually (full T20S workflow with every job type enabled, disabling SSD caching where necessary, and then running 3D flex (still running training) all run fine.

1 Like

Now it’s fixed. It was the conflict of system’s preload cuda 11-2.
So yes, specifying cryosparcw with the cuda 11.7 conda update does work!!
:slight_smile:
Thank you all :smiley:

------------------------ updated -------

Are you guys able to run through 3D Flex Training after specifying 11.7 update in cryosparc worker?

I still got the following error though… I guess unsetting LD_LIBRARY is necessary for the dependencies update though.


[2022-12-16 17:40:16.44]
[CPU: 203.7 MB]
Traceback (most recent call last):
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/…/…/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 80, in cryosparc_compute.run.main
File “/apps/cryosparc4/cryosparc_worker/cryosparc_compute/jobs/jobregister.py”, line 443, in get_run_function
runmod = importlib.import_module("…"+modname, name)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1006, in _gcd_import
File “”, line 983, in _find_and_load
File “”, line 967, in _find_and_load_unlocked
File “”, line 677, in _load_unlocked
File “”, line 1050, in exec_module
File “”, line 219, in _call_with_frames_removed
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py”, line 12, in init cryosparc_compute.jobs.flex_refine.run_train
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py”, line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 217, in
_load_global_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 178, in _load_global_deps
_preload_cuda_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

1 Like

While @scaiola fix work for the update seems like flex train end with error:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 80, in cryosparc_compute.run.main
  File "/opt/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 443, in get_run_function
    runmod = importlib.import_module(".."+modname, __name__)
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1050, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

any thoughts?

1 Like

Hi,
As we are working to address the issues you have been experiencing with
cryosparcw install-3dflex
we are looking for volunteers who would be willing to test our fix.
If you are confident in installing a CryoSPARC test version and, potentially, recovering from a broken installation, please send me a direct message with the following information for your CryoSPARC worker:

  • output of uname -a
  • the value of CRYOSPARC_CUDA_PATH inside cryosparc_worker/config.sh
  • the output of $CRYOSPARC_CUDA_PATH/bin/nvcc --version (after setting the CRYOSPARC_CUDA_PATH environment variable)
  • the output of nvidia-smi
  • the type of your CryoSPARC instance:
    • single workstation (combined master/worker)
    • master and CryoSPARC-managed workers
    • master and cluster workers
2 Likes

This change works for us too!

After the failing install-3dflex installation message, you need to run ./bin/cryosparcw forcedeps to roll back; otherwise it will still give same error.

Ubuntu 18.04.1
Cuda 11.8

1 Like

@Bassem @qitsweauca I got the same error as you. The simple fix is to connect the libraries as follows

export LD_LIBRARY_PATH=/home/xxxxx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/…/…/nvidia/cublas/lib/

There could be a more sophisticated solution. But this worked for me

I got the same error as @Bassem and @qitsweauca running training job after install 3d-flex. Can you elaborate on your fix? What I did is

export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/

and then run the training job, still got the same error.

Shall I add the “export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/” in bashrc file or before running ./bin/cryosparcw install-3dflex?

Also if it’s relevant I’m using CentOS and the original cryosparc was installed using cuda 11.1. Shall I reinstall cryosparc using cuda 11.7?

We have just released CryoSPARC v4.1.1 to address this issue.

3 Likes

I carried out the update, and then ran the install 3dflex.

everything seems working but i got this message at end of install 3dflex

Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

not sure what the last line is impacting if others jobs so far running ok?

I updated from v4.0.3 to v4.1.1 then run ./bin/cryosparcw install-3dflex and got exactly the same error as the above. Anybody has suggestions? Thanks.

Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

We had the same issue on our CentOS machine. It turned out cuda was present in our PATH & LD_LIBRARY_PATH (even though we had removed it from our .bashrc), and I think that was somehow causing the issue. you can check by running export | grep cuda

@olibclarke Shall we redo the --install 3dflex after removing it from PATH?

Yes, that’s what we did

It is okay to ignore this error message in this context.

A post was split to a new topic: cufftAllocFailed during particle extraction

HI Olibclarke,i already update cryosparc 4.11 before install 3dFlex,but install error ! why? please Thank you

bpl-subset/bpl_subset/boost/system/error_code.hpp: At global scope:
bpl-subset/bpl_subset/boost/system/error_code.hpp:214:36: warning: ‘pycudaboost::system::posix_category’ defined but not used [-Wunused-variable]
static const error_category & posix_category = generic_category();
^
bpl-subset/bpl_subset/boost/system/error_code.hpp:215:36: warning: ‘pycudaboost::system::errno_ecat’ defined but not used [-Wunused-variable]
static const error_category & errno_ecat = generic_category();
^
bpl-subset/bpl_subset/boost/system/error_code.hpp:216:36: warning: ‘pycudaboost::system::native_ecat’ defined but not used [-Wunused-variable]
static const error_category & native_ecat = system_category();
^
error: command ‘gcc’ failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /work/home/faculty/yhzhang/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/python3.7 -u -c ‘import io, os, sys, setuptools, tokenize; sys.argv[0] = ‘"’"’/tmp/pip-req-build-yet46pjv/setup.py’"’"’; file=’"’"’/tmp/pip-req-build-yet46pjv/setup.py’"’"’;f = getattr(tokenize, ‘"’"‘open’"’"’, open)(file) if os.path.exists(file) else io.StringIO(’"’"‘from setuptools import setup; setup()’"’"’);code = f.read().replace(’"’"’\r\n’"’"’, ‘"’"’\n’"’"’);f.close();exec(compile(code, file, ‘"’"‘exec’"’"’))’ install --record /tmp/pip-record-dvkpa246/install-record.txt --single-version-externally-managed --compile --install-headers /work/home/faculty/yhzhang/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m/pycuda Check the logs for full command output.

Welcome to the forum @alexlu.
Please can you provide additional information:

  1. the command that produced this error
  2. output of uname -a
  3. output of gcc -v
  4. output of /usr/bin/env

You may also try:

hi CryoSPARC Team
my error already down,because my worker has more gpu node。
i check master already update 4.11.but worker no update to 4.11 .worke r version is 4.10.
before I update worker to 4.11。OK!

Try nvidia-smi to check the cards are detected, then try using cryoSPARC anyway. That error is not an “error” in that it failed to install, as wtempel mentioned up-thread.

If there are no other errors, the message
PyTorch not installed correctly, or NVIDIA GPU not detected.
does not indicate a failure of the installation.