3DFlex Dependencies; Building pycuda

Bassem · December 15, 2022, 5:47pm

Thank you. Worked for us

without unsetting LD_LIBRARY_PATH

posertinlab · December 15, 2022, 9:04pm

This worked for us as well! For reference it’s line 457 of cryosparcw.

rbs_sci · December 16, 2022, 12:36am

Glad I was on the right path, at least.

All my poking around with quirky CUDA installs appears to have broken my worker install in a rather terminal fashion. Oh well, the database is safe, time for a reinstall.

Then maybe I can try 3D flex out.

edit: well, it installed OK with a non-broken conda install…

Successfully built pycuda
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
3D Flex Refine dependencies installed successfully.

Now to test it out.

edit 2:

All tests fail. Suspect due to --nossd flag used during install.

However, running jobs manually (full T20S workflow with every job type enabled, disabling SSD caching where necessary, and then running 3D flex (still running training) all run fine.

qitsweauca · December 16, 2022, 6:45am

Now it’s fixed. It was the conflict of system’s preload cuda 11-2.
So yes, specifying cryosparcw with the cuda 11.7 conda update does work!!

Thank you all

------------------------ updated -------

Are you guys able to run through 3D Flex Training after specifying 11.7 update in cryosparc worker?

I still got the following error though… I guess unsetting LD_LIBRARY is necessary for the dependencies update though.

[2022-12-16 17:40:16.44]
[CPU: 203.7 MB]
Traceback (most recent call last):
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/…/…/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 80, in cryosparc_compute.run.main
File “/apps/cryosparc4/cryosparc_worker/cryosparc_compute/jobs/jobregister.py”, line 443, in get_run_function
runmod = importlib.import_module("…"+modname, name)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1006, in _gcd_import
File “”, line 983, in _find_and_load
File “”, line 967, in _find_and_load_unlocked
File “”, line 677, in _load_unlocked
File “”, line 1050, in exec_module
File “”, line 219, in _call_with_frames_removed
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py”, line 12, in init cryosparc_compute.jobs.flex_refine.run_train
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py”, line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 217, in
_load_global_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 178, in _load_global_deps
_preload_cuda_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

Bassem · December 16, 2022, 3:01pm

While @scaiola fix work for the update seems like flex train end with error:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 80, in cryosparc_compute.run.main
  File "/opt/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 443, in get_run_function
    runmod = importlib.import_module(".."+modname, __name__)
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1050, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

any thoughts?

wtempel · December 16, 2022, 7:51pm

Hi,
As we are working to address the issues you have been experiencing with
cryosparcw install-3dflex
we are looking for volunteers who would be willing to test our fix.
If you are confident in installing a CryoSPARC test version and, potentially, recovering from a broken installation, please send me a direct message with the following information for your CryoSPARC worker:

output of uname -a
the value of CRYOSPARC_CUDA_PATH inside cryosparc_worker/config.sh
the output of $CRYOSPARC_CUDA_PATH/bin/nvcc --version (after setting the CRYOSPARC_CUDA_PATH environment variable)
the output of nvidia-smi
the type of your CryoSPARC instance:
- single workstation (combined master/worker)
- master and CryoSPARC-managed workers
- master and cluster workers

Zy90 · December 17, 2022, 11:51pm

This change works for us too!

After the failing install-3dflex installation message, you need to run ./bin/cryosparcw forcedeps to roll back; otherwise it will still give same error.

Ubuntu 18.04.1
Cuda 11.8

wilnart · December 20, 2022, 3:04am

@Bassem @qitsweauca I got the same error as you. The simple fix is to connect the libraries as follows

export LD_LIBRARY_PATH=/home/xxxxx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/…/…/nvidia/cublas/lib/

There could be a more sophisticated solution. But this worked for me

hwangab · December 20, 2022, 1:00pm

I got the same error as @Bassem and @qitsweauca running training job after install 3d-flex. Can you elaborate on your fix? What I did is

export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/

and then run the training job, still got the same error.

Shall I add the “export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/” in bashrc file or before running ./bin/cryosparcw install-3dflex?

Also if it’s relevant I’m using CentOS and the original cryosparc was installed using cuda 11.1. Shall I reinstall cryosparc using cuda 11.7?

wtempel · December 20, 2022, 4:50pm

We have just released CryoSPARC v4.1.1 to address this issue.

Bassem · December 20, 2022, 5:48pm

I carried out the update, and then ran the install 3dflex.

everything seems working but i got this message at end of install 3dflex

Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

not sure what the last line is impacting if others jobs so far running ok?

donghuachen · December 20, 2022, 8:13pm

I updated from v4.0.3 to v4.1.1 then run ./bin/cryosparcw install-3dflex and got exactly the same error as the above. Anybody has suggestions? Thanks.

Installing collected packages: torch
Successfully installed torch-1.13.1
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2020.1-cp38-cp38-linux_x86_64.whl
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

olibclarke · December 20, 2022, 8:16pm

We had the same issue on our CentOS machine. It turned out cuda was present in our PATH & LD_LIBRARY_PATH (even though we had removed it from our .bashrc), and I think that was somehow causing the issue. you can check by running export | grep cuda

Bassem · December 20, 2022, 8:34pm

@olibclarke Shall we redo the --install 3dflex after removing it from PATH?

olibclarke · December 20, 2022, 8:59pm

Yes, that’s what we did

wtempel · December 20, 2022, 9:04pm

It is okay to ignore this error message in this context.

wtempel · December 20, 2022, 9:38pm

A post was split to a new topic: cufftAllocFailed during particle extraction

alexlu · December 24, 2022, 4:06am

HI Olibclarke，i already update cryosparc 4.11 before install 3dFlex,but install error ! why? please Thank you

bpl-subset/bpl_subset/boost/system/error_code.hpp: At global scope:
bpl-subset/bpl_subset/boost/system/error_code.hpp:214:36: warning: ‘pycudaboost::system::posix_category’ defined but not used [-Wunused-variable]
static const error_category & posix_category = generic_category();
^
bpl-subset/bpl_subset/boost/system/error_code.hpp:215:36: warning: ‘pycudaboost::system::errno_ecat’ defined but not used [-Wunused-variable]
static const error_category & errno_ecat = generic_category();
^
bpl-subset/bpl_subset/boost/system/error_code.hpp:216:36: warning: ‘pycudaboost::system::native_ecat’ defined but not used [-Wunused-variable]
static const error_category & native_ecat = system_category();
^
error: command ‘gcc’ failed with exit status 1
----------------------------------------
ERROR: Command errored out with exit status 1: /work/home/faculty/yhzhang/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/python3.7 -u -c ‘import io, os, sys, setuptools, tokenize; sys.argv[0] = ‘"’"’/tmp/pip-req-build-yet46pjv/setup.py’"’"’; file=’"’"’/tmp/pip-req-build-yet46pjv/setup.py’"’"’;f = getattr(tokenize, ‘"’"‘open’"’"’, open)(file) if os.path.exists(file) else io.StringIO(’"’"‘from setuptools import setup; setup()’"’"’);code = f.read().replace(’"’"’\r\n’"’"’, ‘"’"’\n’"’"’);f.close();exec(compile(code, file, ‘"’"‘exec’"’"’))’ install --record /tmp/pip-record-dvkpa246/install-record.txt --single-version-externally-managed --compile --install-headers /work/home/faculty/yhzhang/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m/pycuda Check the logs for full command output.

wtempel · December 28, 2022, 3:49pm

Welcome to the forum @alexlu.
Please can you provide additional information:

the command that produced this error
output of uname -a
output of gcc -v
output of /usr/bin/env

You may also try:

alexlu · December 29, 2022, 12:14am

hi CryoSPARC Team
my error already down，because my worker has more gpu node。
i check master already update 4.11.but worker no update to 4.11 .worke r version is 4.10.
before I update worker to 4.11。OK！