3DFlex Dependencies; Building pycuda

Same issue here, RHEL 8.5

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/8/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: …/configure --enable-bootstrap --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --disable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 8.5.0 20210514 (Red Hat 8.5.0-15) (GCC)

Building wheels for collected packages: pycuda
Building wheel for pycuda (setup.py) … error
ERROR: Command errored out with exit status 1:
command: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/python3.7 -u -c ‘import io, os, sys, setuptools, tokenize; sys.argv[0] = ‘"’"’/tmp/pip-req-build-riub75yd/setup.py’"’"’; file=’"’"’/tmp/pip-req-build-riub75yd/setup.py’"’"’;f = getattr(tokenize, ‘"’"‘open’"’"’, open)(file) if os.path.exists(file) else io.StringIO(’"’"‘from setuptools import setup; setup()’"’"’);code = f.read().replace(’"’"’\r\n’"’"’, ‘"’"’\n’"’"’);f.close();exec(compile(code, file, ‘"’"‘exec’"’"’))’ bdist_wheel -d /tmp/pip-wheel-to6x7bpc
cwd: /tmp/pip-req-build-riub75yd/
Complete output (3926 lines):


*** I have detected that you have not run configure.py.


*** Additionally, no global config files were found.
*** I will go ahead with the default configuration.
*** In all likelihood, this will not work out.


*** See README_SETUP.txt for more information.


*** If the build does fail, just re-run configure.py with the
*** correct arguments, and then retry. Good luck!


*** HIT Ctrl-C NOW IF THIS IS NOT WHAT YOU WANT


Continuing in 10 seconds…
Continuing in 9 seconds…
Continuing in 8 seconds…
Continuing in 7 seconds…
Continuing in 6 seconds…
Continuing in 5 seconds…
Continuing in 4 seconds…
Continuing in 3 seconds…
Continuing in 2 seconds…
Continuing in 1 seconds…
/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: ‘test_requires’
warnings.warn(msg)
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/pycuda
copying pycuda/init.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/_cluda.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/_mymako.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/autoinit.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/characterize.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/compiler.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/cumath.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/curandom.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/debug.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/driver.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/elementwise.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/gpuarray.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/reduction.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/scan.py → build/lib.linux-x86_64-3.7/pycuda
copying pycuda/tools.py → build/lib.linux-x86_64-3.7/pycuda
creating build/lib.linux-x86_64-3.7/pycuda/gl
copying pycuda/gl/init.py → build/lib.linux-x86_64-3.7/pycuda/gl
copying pycuda/gl/autoinit.py → build/lib.linux-x86_64-3.7/pycuda/gl
creating build/lib.linux-x86_64-3.7/pycuda/sparse
copying pycuda/sparse/init.py → build/lib.linux-x86_64-3.7/pycuda/sparse
copying pycuda/sparse/cg.py → build/lib.linux-x86_64-3.7/pycuda/sparse
copying pycuda/sparse/coordinate.py → build/lib.linux-x86_64-3.7/pycuda/sparse
copying pycuda/sparse/inner.py → build/lib.linux-x86_64-3.7/pycuda/sparse
copying pycuda/sparse/operator.py → build/lib.linux-x86_64-3.7/pycuda/sparse
copying pycuda/sparse/packeted.py → build/lib.linux-x86_64-3.7/pycuda/sparse
copying pycuda/sparse/pkt_build.py → build/lib.linux-x86_64-3.7/pycuda/sparse
creating build/lib.linux-x86_64-3.7/pycuda/compyte
copying pycuda/compyte/init.py → build/lib.linux-x86_64-3.7/pycuda/compyte
copying pycuda/compyte/array.py → build/lib.linux-x86_64-3.7/pycuda/compyte
copying pycuda/compyte/dtypes.py → build/lib.linux-x86_64-3.7/pycuda/compyte
running egg_info
writing pycuda.egg-info/PKG-INFO
writing dependency_links to pycuda.egg-info/dependency_links.txt
writing requirements to pycuda.egg-info/requires.txt
writing top-level names to pycuda.egg-info/top_level.txt
reading manifest file ‘pycuda.egg-info/SOURCES.txt’
reading manifest template ‘MANIFEST.in’
warning: no files found matching ‘doc/source/_static/.css’
warning: no files found matching 'doc/source/_templates/
.html’
warning: no files found matching ‘.cpp’ under directory ‘bpl-subset/bpl_subset/boost’
warning: no files found matching '
.html’ under directory ‘bpl-subset/bpl_subset/boost’
warning: no files found matching ‘.inl’ under directory ‘bpl-subset/bpl_subset/boost’
warning: no files found matching '
.txt’ under directory ‘bpl-subset/bpl_subset/boost’
warning: no files found matching ‘.h’ under directory ‘bpl-subset/bpl_subset/libs’
warning: no files found matching '
.ipp’ under directory ‘bpl-subset/bpl_subset/libs’
warning: no files found matching ‘*.pl’ under directory ‘bpl-subset/bpl_subset/libs’
adding license file ‘LICENSE’
writing manifest file ‘pycuda.egg-info/SOURCES.txt’
creating build/lib.linux-x86_64-3.7/pycuda/cuda
copying pycuda/cuda/pycuda-complex-impl.hpp → build/lib.linux-x86_64-3.7/pycuda/cuda
copying pycuda/cuda/pycuda-complex.hpp → build/lib.linux-x86_64-3.7/pycuda/cuda
copying pycuda/cuda/pycuda-helpers.hpp → build/lib.linux-x86_64-3.7/pycuda/cuda
copying pycuda/sparse/pkt_build_cython.pyx → build/lib.linux-x86_64-3.7/pycuda/sparse
running build_ext
building 'driver’ extension
creating build/temp.linux-x86_64-3.7
creating build/temp.linux-x86_64-3.7/src
creating build/temp.linux-x86_64-3.7/src/cpp
creating build/temp.linux-x86_64-3.7/src/wrapper
creating build/temp.linux-x86_64-3.7/bpl-subset
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/python
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/python/src
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/python/src/converter
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/python/src/object
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/smart_ptr
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/smart_ptr/src
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/system
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/system/src
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/thread
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/thread/src
creating build/temp.linux-x86_64-3.7/bpl-subset/bpl_subset/libs/thread/src/pthread
gcc -pthread -B /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/compiler_compat -Wl,–sysroot=/ -Wsign-compare -fwrapv -Wall -O3 -DNDEBUG -fPIC -DBOOST_ALL_NO_LIB=1 -DBOOST_THREAD_BUILD_DLL=1 -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_PYTHON_SOURCE=1 -Dboost=pycudaboost -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PACKAGE=pycuda -DPYGPU_PYCUDA=1 -DHAVE_CURAND=1 -Isrc/cpp -Ibpl-subset/bpl_subset -I/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include -I/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/numpy/core/include -I/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include/python3.7m -c src/cpp/cuda.cpp -o build/temp.linux-x86_64-3.7/src/cpp/cuda.o
In file included from bpl-subset/bpl_subset/boost/smart_ptr/shared_ptr.hpp:32,
from bpl-subset/bpl_subset/boost/shared_ptr.hpp:17,
from src/cpp/cuda.hpp:30,
from src/cpp/cuda.cpp:4:
bpl-subset/bpl_subset/boost/smart_ptr/detail/shared_count.hpp:284:33: warning: ‘template class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
explicit shared_count( std::auto_ptr & r ): pi
( new sp_counted_impl_p( r.get() ) )
^~~~~~~~
In file included from /usr/include/c++/8/memory:80,
from bpl-subset/bpl_subset/boost/config/no_tr1/memory.hpp:21,
from bpl-subset/bpl_subset/boost/smart_ptr/shared_ptr.hpp:27,
from bpl-subset/bpl_subset/boost/shared_ptr.hpp:17,
from src/cpp/cuda.hpp:30,
from src/cpp/cuda.cpp:4:
/usr/include/c++/8/bits/unique_ptr.h:53:28: note: declared here
template class auto_ptr;
^~~~~~~~
In file included from bpl-subset/bpl_subset/boost/shared_ptr.hpp:17,
from src/cpp/cuda.hpp:30,
from src/cpp/cuda.cpp:4:
bpl-subset/bpl_subset/boost/smart_ptr/shared_ptr.hpp:146:65: warning: ‘template class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
template< class T, class R > struct sp_enable_if_auto_ptr< std::auto_ptr< T >, R >
^~~~~~~~
In file included from /usr/include/c++/8/memory:80,
from bpl-subset/bpl_subset/boost/config/no_tr1/memory.hpp:21,
from bpl-subset/bpl_subset/boost/smart_ptr/shared_ptr.hpp:27,
from bpl-subset/bpl_subset/boost/shared_ptr.hpp:17,
from src/cpp/cuda.hpp:30,
from src/cpp/cuda.cpp:4:
/usr/include/c++/8/bits/unique_ptr.h:53:28: note: declared here
template class auto_ptr;
^~~~~~~~
In file included from bpl-subset/bpl_subset/boost/shared_ptr.hpp:17,
from src/cpp/cuda.hpp:30,
from src/cpp/cuda.cpp:4:
bpl-subset/bpl_subset/boost/smart_ptr/shared_ptr.hpp:285:30: warning: ‘template class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
explicit shared_ptr(std::auto_ptr & r): px(r.get()), pn()
^~~~~~~~
In file included from /usr/include/c++/8/memory:80,
from bpl-subset/bpl_subset/boost/config/no_tr1/memory.hpp:21,
from bpl-subset/bpl_subset/boost/smart_ptr/shared_ptr.hpp:27,
from bpl-subset/bpl_subset/boost/shared_ptr.hpp:17,
from src/cpp/cuda.hpp:30,
from src/cpp/cuda.cpp:4:
/usr/include/c++/8/bits/unique_ptr.h:53:28: note: declared here
template class auto_ptr;
^~~~~~~~
In file included from bpl-subset/bpl_subset/boost/shared_ptr.hpp:17,
from src/cpp/cuda.hpp:30,
from src/cpp/cuda.cpp:4:
bpl-subset/bpl_subset/boost/smart_ptr/shared_ptr.hpp:329:34: warning: ‘template class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
shared_ptr & operator=( std::auto_ptr & r )
^~~~~~~~
In file included from /usr/include/c++/8/memory:80,

1 Like

I’ve had a lot of problems in the past with cuda (nvcc) and gcc version mismatch. The cuda compiler (nvcc) shipped with older cuda versions might not be compatible with the version of gcc that your system is using (e.g. nvcc wants gcc vesion <7, but ubuntu has default gcc >9).
I cannot find a suitable table of compatibilities, but you can always try to use a lower version of gcc / g++ and see if that works.

I have had success installing multiple gcc (and g++) versions, e.g.
sudo apt install gcc-9 g++9
sudo apt install gcc-8 g++8

and then using update-alternatives to manage which version is used at any given time (update-alternatives(1) - Linux manual page)

e.g.
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++9 1
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g+±8 1

#and then select the one you need with
update-alternatives --config g++

Do the same for gcc, then try to build pycuda again.
Do not forget to configure the gcc / g++ back to the default version once you are done.

1 Like

Hi, just spitballing here, but I’ve had similar issues when we installed our GTX 4090 and had to reinstall pycuda in our existing cryosparc installation.
‘CU_TARGET_COMPUTE_90’ is what the 4090 wants, but not all cuda versions even know that compute capabilities “90” exist (cuda > 11.7 I think?)

Maybe cryosparc is referencing compute 90 but the pycuda version that is compatible with your cuda does not know what cryosparc is talking about.

When I had these problems with the new 4090, I solved them by installing cuda 11.8 and then forcing cryosparc to reinstall cuda with

cryosparc_worker/bin/cryosparcw newcuda <path_to_cuda>

which forces dependency reinstall as well.

Worth a shot?

1 Like

We’re using cuda 11.8 on our devices already since we’re using RTX A5000 and A4500 GPUs. Same error occurs across all of them.

It almost seems like there’s a configuration error with pytorch. Like cryosparc is using a non-standard configuration and it’s making pycuda freak out.

1 Like

Testing the 3D flex refine install on a system which has just finished a big run (and project is stored safely…) Ubuntu 22.04, with CUDA 11.8.

Update to 4.1.0 went smoothly. 3D flex refine install went smoothly until it hit PyCUDA, then did the now recognisable PyCUDA build fail.

During the 3D flex install, it seems to be downloading a mix of CUDA 11.7 and CUDA 12. CUDA 12 currently causes a world of pain for all of the 3DEM programs I’ve actually tried it with.

The PyCUDA build references very old compute targets, which might have been deprecated in CUDA 12?

Would anyone else please check whether they see cuda-cudart-12, cuda-cudart-dev-12 etc in their install list?

Testing a manual PyCUDA install now, but unlike getting it working for Arch, this is just sitting on “Solving environment” indefinitely.

Can provide logs if necessary. Updates if the manual PyCUDA install works.

Update: manually installing PyCUDA eventually throws dozens of compatibility errors (but anaconda took about three hours to work it all out). Will revert to non-3D Flex install for now.

1 Like

Hey folks,

Had the same issue and figured out how to fix it. I think @rbs_sci is right and it’s a mix of cuda version which is the issue.

I simply change the line of cryosparcw from

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia

to

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c “nvidia/label/cuda-11.7.0”

This force the usage of 11.7 for all packages and it seems to work afterwards

Hope that helps.

10 Likes

Hey @scaiola what do you mean exactly?

I simply change the line of cryosparcw from xxxxxxxx

Could you provide a little more elaborate step description?
Thanks!

1 Like

Can confirm this works for us too, thank you!!

@stavros you need to edit the cryosparcw script (used to install the 3D-flex dependencies), located in the worker/bin directory, editing the conda install line in the way that @scaiola showed works. Be careful when making the edit if you are pasting the new line in - there are multiple types of quotation marks, and those that get pasted can have formatting that you don’t want, so I would recommend typing rather than copy-pasting.

Cheers
Oli

EDIT:
This worked for the Ubuntu system. For the CentOS one it is not quite there:

Requirement already satisfied: pytools>=2011.2 in ./deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages (from pycuda==2020.1) (2020.4.4)
Requirement already satisfied: decorator>=3.2.0 in ./deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages (from pycuda==2020.1) (4.4.2)
Requirement already satisfied: appdirs>=1.4.0 in ./deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages (from pycuda==2020.1) (1.4.4)
Requirement already satisfied: mako in ./deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages (from pycuda==2020.1) (1.1.6)
Requirement already satisfied: numpy>=1.6.0 in ./deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages (from pytools>=2011.2->pycuda==2020.1) (1.19.5)
Requirement already satisfied: MarkupSafe>=0.9.2 in ./deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages (from mako->pycuda==2020.1) (2.0.1)
Building wheels for collected packages: pycuda
  Building wheel for pycuda (setup.py) ... done
  Created wheel for pycuda: filename=pycuda-2020.1-cp37-cp37m-linux_x86_64.whl size=615783 sha256=329190f5458cf4d0e958ca89ab1226de960ff4bc47c9032d7f90bbe8a2d2aad5
  Stored in directory: /home/tmp/pip-ephem-wheel-cache-01nqtq71/wheels/fe/c9/2f/377db1b07f46ef88920cd6e533c2f0e1d0d0e3a5dbac1997bb
Successfully built pycuda
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
PyTorch not installed correctly, or NVIDIA GPU not detected.

EDIT2:
Fixed! Had to unset LD_LIBRARY_PATH first.

Well… almost fixed. On our Ubuntu machine all runs fine. On CentOS it says dependencies installed successfully, but any 3D-flex job “terminates abnormally”, with no useful info in the joblog.

EDIT3: OK now it is working on CentOS, but I have no idea why or what changed…

2 Likes

Thanks @stavros great spotting.
The double quotation marks are not needed and it also compiles with cuda-11.7.1 label:

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia/label/cuda-11.7.1

But would 3Dflex not work out of the box if one just did a normal worker install, compiled against external cuda-11.7
Followed by:

pip install torch ninja

And why suddenly the need for internal cuda env, when the rest of cryosparc uses system cuda env?

Best,
Jesper

EDIT: I just tested this. 3DFlex does not need the internal CUDA env in anaconda. It can be build against system CUDA installation. Just run normal update from worker directory:

eval $(bin/cryosparcw env)
bin/cryosparcw update
pip install torch ninja

Now pycuda in cryosparc is build against system CUDA defined in config.sh and it even runs.

1 Like

This gets us through the installation process as well. Brilliant.

1 Like

changing the line of cryosparcw from:

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c nvidia

to

conda install -y cuda-nvcc=11.7 cuda-toolkit=11.7 -c “nvidia/label/cuda-11.7.0”

worked for us.

Thanks @scaiola

1 Like

Thank you. Worked for us :slight_smile:

without unsetting LD_LIBRARY_PATH

1 Like

This worked for us as well! For reference it’s line 457 of cryosparcw.

2 Likes

Glad I was on the right path, at least. :smile:

All my poking around with quirky CUDA installs appears to have broken my worker install in a rather terminal fashion. Oh well, the database is safe, time for a reinstall.

Then maybe I can try 3D flex out. :smiley:

edit: well, it installed OK with a non-broken conda install…

Successfully built pycuda
Installing collected packages: pycuda
Successfully installed pycuda-2020.1
3D Flex Refine dependencies installed successfully.

Now to test it out.

edit 2:

All tests fail. Suspect due to --nossd flag used during install.

However, running jobs manually (full T20S workflow with every job type enabled, disabling SSD caching where necessary, and then running 3D flex (still running training) all run fine.

1 Like

Now it’s fixed. It was the conflict of system’s preload cuda 11-2.
So yes, specifying cryosparcw with the cuda 11.7 conda update does work!!
:slight_smile:
Thank you all :smiley:

------------------------ updated -------

Are you guys able to run through 3D Flex Training after specifying 11.7 update in cryosparc worker?

I still got the following error though… I guess unsetting LD_LIBRARY is necessary for the dependencies update though.


[2022-12-16 17:40:16.44]
[CPU: 203.7 MB]
Traceback (most recent call last):
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 172, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/…/…/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 80, in cryosparc_compute.run.main
File “/apps/cryosparc4/cryosparc_worker/cryosparc_compute/jobs/jobregister.py”, line 443, in get_run_function
runmod = importlib.import_module("…"+modname, name)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1006, in _gcd_import
File “”, line 983, in _find_and_load
File “”, line 967, in _find_and_load_unlocked
File “”, line 677, in _load_unlocked
File “”, line 1050, in exec_module
File “”, line 219, in _call_with_frames_removed
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py”, line 12, in init cryosparc_compute.jobs.flex_refine.run_train
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py”, line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 217, in
_load_global_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 178, in _load_global_deps
_preload_cuda_deps()
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/init.py”, line 158, in _preload_cuda_deps
ctypes.CDLL(cublas_path)
File “/apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/init.py”, line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: /apps/cryosparc4/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtGetStatusString, version libcublasLt.so.11

1 Like

While @scaiola fix work for the update seems like flex train end with error:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 80, in cryosparc_compute.run.main
  File "/opt/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py", line 443, in get_run_function
    runmod = importlib.import_module(".."+modname, __name__)
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 1050, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py", line 12, in init cryosparc_compute.jobs.flex_refine.run_train
  File "cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py", line 19, in init cryosparc_compute.jobs.flex_refine.flexmod
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 191, in <module>
    _load_global_deps()
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/__init__.py", line 153, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

any thoughts?

1 Like

Hi,
As we are working to address the issues you have been experiencing with
cryosparcw install-3dflex
we are looking for volunteers who would be willing to test our fix.
If you are confident in installing a CryoSPARC test version and, potentially, recovering from a broken installation, please send me a direct message with the following information for your CryoSPARC worker:

  • output of uname -a
  • the value of CRYOSPARC_CUDA_PATH inside cryosparc_worker/config.sh
  • the output of $CRYOSPARC_CUDA_PATH/bin/nvcc --version (after setting the CRYOSPARC_CUDA_PATH environment variable)
  • the output of nvidia-smi
  • the type of your CryoSPARC instance:
    • single workstation (combined master/worker)
    • master and CryoSPARC-managed workers
    • master and cluster workers
2 Likes

This change works for us too!

After the failing install-3dflex installation message, you need to run ./bin/cryosparcw forcedeps to roll back; otherwise it will still give same error.

Ubuntu 18.04.1
Cuda 11.8

1 Like

@Bassem @qitsweauca I got the same error as you. The simple fix is to connect the libraries as follows

export LD_LIBRARY_PATH=/home/xxxxx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/…/…/nvidia/cublas/lib/

There could be a more sophisticated solution. But this worked for me

I got the same error as @Bassem and @qitsweauca running training job after install 3d-flex. Can you elaborate on your fix? What I did is

export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/

and then run the training job, still got the same error.

Shall I add the “export LD_LIBRARY_PATH=/home/exx/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/nvidia/cublas/lib/” in bashrc file or before running ./bin/cryosparcw install-3dflex?

Also if it’s relevant I’m using CentOS and the original cryosparc was installed using cuda 11.1. Shall I reinstall cryosparc using cuda 11.7?