I’m having trouble to run 3D Flex Refinement on RTX4090.
After I installed the dependency using cryosparcw, the job failed on “RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR” error. I believe this is CUDA-related issue specific to RTX4090 running pytorch-cuda=11.7, which is the version that cryosparcw installed. It’s the same error as in CUFFT_INTERNAL_ERROR on RTX 4090 · Issue #88038 · pytorch/pytorch · GitHub.
To get around that, I used cryosparcw ipython to upgrade pytorch to pytorch-cuda=11.8, and simple test on pytorch in ipython went fine, but the 3D Flex after upgrading pytorch failed with following error:
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 83, in cryosparc_compute.run.main
File “/home/hongjiang/cryosparc/cryosparc_worker/cryosparc_compute/jobs/jobregister.py”, line 442, in get_run_function
runmod = importlib.import_module(“…”+modname, name)
File “/home/hongjiang/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/importlib/init.py”, line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File “”, line 1014, in _gcd_import
File “”, line 991, in _find_and_load
File “”, line 975, in _find_and_load_unlocked
File “”, line 671, in _load_unlocked
File “”, line 1174, in exec_module
File “”, line 219, in _call_with_frames_removed
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/run_train.py”, line 12, in init cryosparc_compute.jobs.flex_refine.run_train
File “cryosparc_master/cryosparc_compute/jobs/flex_refine/flexmod.py”, line 24, in init cryosparc_compute.jobs.flex_refine.flexmod
ImportError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
I think this error happened because in pytorch-11.8, there is no longer a libtorch_cuda_cu.so, and all libraries are merged into one single libtorch_cuda.so file.
Any advice on how to get around this issue?