2D classification job error 'No module named 'pycuda'' after updating to 4.1.2

nturner · January 27, 2023, 12:35pm

Hi,

I am running cryosparc through a cluster, interfacing with the GUI using a single workstation. The cluster uses Cuda release 12.0, V12.0.76.

I recently updated to Cryosparc 4.1.2 to make use of 3D Flex, which I’ve been having a lot of fun with and found very informative for my project. The 3D flex jobs all ran well with no reported errors. I had not attempted any 2D classification jobs at this time. I was then forced to move ports on the cluster I am running my Cryosparc instance from. Subsequently, when I attempted to run a 2D classification job, using as Input particles.star file preprocessed and picked in Relion 4.0, in a new project and workspace, I encountered the following error:

Traceback (most recent call last):
** File “cryosparc_master/cryosparc_compute/run.py”, line 83, in cryosparc_compute.run.main**
** File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/cryosparc_compute/jobs/jobregister.py”, line 442, in get_run_function**
** runmod = importlib.import_module(“…”+modname, name)**
** File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/importlib/init.py”, line 127, in import_module**
** return _bootstrap._gcd_import(name[level:], package, level)**
** File “”, line 1014, in _gcd_import**
** File “”, line 991, in _find_and_load**
** File “”, line 975, in _find_and_load_unlocked**
** File “”, line 671, in _load_unlocked**
** File “”, line 1174, in exec_module**
** File “”, line 219, in _call_with_frames_removed**
** File “cryosparc_master/cryosparc_compute/jobs/class2D/run.py”, line 13, in init cryosparc_compute.jobs.class2D.run**
** File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/cryosparc_compute/engine/init.py”, line 8, in **
** from .engine import * # noqa**
** File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 9, in init cryosparc_compute.engine.engine**
** File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 4, in init cryosparc_compute.engine.cuda_core**
ModuleNotFoundError: No module named ‘pycuda’

I tried rolling back to 4.0.0, but this gave a similar error. I then tried deleting the worker dept folder, and installing 4.1.2 update after which I received these 2 error messages:

Running setup.py install for pycuda … error
** error: subprocess-exited-with-error**

** × Running setup.py install for pycuda did not run successfully.**
** │ exit code: 1**
** ╰─> [137 lines of output]**

** *** I have detected that you have not run configure.py.**

** *** Additionally, no global config files were found.**
** *** I will go ahead with the default configuration.**
** *** In all likelihood, this will not work out.**

** *** See README_SETUP.txt for more information.**

** *** If the build does fail, just re-run configure.py with the**
** *** correct arguments, and then retry. Good luck!**

** *** HIT Ctrl-C NOW IF THIS IS NOT WHAT YOU WANT**

…
############################
** # Package would be ignored #**
** ############################**
** Python recognizes ‘pycuda.cuda’ as an importable package,**
** but it is not listed in the packages configuration of setuptools.**

** ‘pycuda.cuda’ has been automatically added to the distribution only**
** because it may contain data files, but this behavior is likely to change**
** in future versions of setuptools (and therefore is considered deprecated).**

** Please make sure that ‘pycuda.cuda’ is included as a package by using**
** the packages configuration field or the proper discovery methods**
** (for example by using find_namespace_packages(...)/find_namespace:**
** instead of find_packages(...)/find:).**

** You can read more about “package discovery” and “data files” on setuptools**
** documentation page.**
…
gcc: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory
** compilation terminated.**
** error: command ‘/lmb/home/nturner/mambaforge/bin/gcc’ failed with exit code 1**
** [end of output]**

The gcc error is despite my having loaded ‘compilers/gcc/12.1.0’ and ‘cuda/12.0’ modules on the cluster I am attempting to update Cryosparc from.

I tried to get around this through pip install pycuda command to the dependencies folder, but got this error message during build wheel:

error: command ‘gcc’ failed with exit status 1
** ----------------------------------------**
** ERROR: Failed building wheel for pycuda**

It would be great if I could receive some help with this.

Many thanks,
Noah

wtempel · January 27, 2023, 2:01pm

Welcome to the forum @nturner

Please can you recount the steps and commands that you performed as a consequence?

Which folder did you delete?

What command did you run?

nturner · January 27, 2023, 2:18pm

Thank you for the welcome!

cryosparcm stop
cryosparcm backup
cryosparcm changeport 51530

rm -rf <path_to>/cryosparc_worker/deps

cryosparcm update
scp -r <path_to>/cryosparc_master/cryosparc_worker.tar.gz <path_to>/cryosparc_worker
bin/cryosparcw update

wtempel · January 27, 2023, 3:12pm

Hi Noah,

If your cryosparc_worker installation is currently broken anyway, would you like to try (I have not tried this myself)

mv cryosparc_worker cryosparc_worker_old
tar xfv cryosparc_worker_old/cryosparc_worker.tar.gz
cd cryosparc_worker
```
./install.sh --license "your-license-id" --cudapath /path/to/cuda-12.0 2>&1 | tee install_20230127.log
```
I expect this to eventually fail (CUDA-12.0 may currently be incompatible with CryoSPARC), but this failure should be “repaired” by the next step.

export PATH=/usr/bin:/bin
export LD_LIBRARY_PATH=""
./bin/cryosparcw install-3dflex

Does this restore 2D classification functionality?

nturner · January 27, 2023, 5:44pm

Thank you for the suggestion, I’ve followed the steps you listed.

wtempel:

./install.sh --license "your-license-id" --cudapath /path/to/cuda-12.0 2>&1 | tee install_20230127.log
I expect this to eventually fail (CUDA-12.0 may currently be incompatible with CryoSPARC), but this failure should be “repaired” by the next step.

This did indeed fail, as you expected. I progressed onto the next step, as suggested.

Unfortunately not, 2D classification fails at the same point (after loading the particle stack and windowing the particles, just as iteration 0 is beginning) with this error message:

Traceback (most recent call last):
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/tools.py”, line 429, in context_dependent_memoize
return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x1456b8953c80>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/class2D/run.py”, line 336, in cryosparc_compute.jobs.class2D.run.run_class_2D
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 964, in cryosparc_compute.engine.engine.process
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 974, in cryosparc_compute.engine.engine.process
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 156, in cryosparc_compute.engine.cuda_core.allocate_gpu
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/gpuarray.py”, line 549, in fill
func = elementwise.get_fill_kernel(self.dtype)
File “”, line 2, in get_fill_kernel
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/tools.py”, line 433, in context_dependent_memoize
result = func(*args)
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 493, in get_fill_kernel
return get_elwise_kernel(
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 162, in get_elwise_kernel
mod, func, arguments = get_elwise_kernel_and_types(
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 148, in get_elwise_kernel_and_types
mod = module_builder(arguments, operation, name,
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/elementwise.py”, line 45, in get_elwise_module
return SourceModule(“”"
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 290, in init
cubin = compile(source, nvcc, options, keep, no_extern_c,
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 78, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode(“utf-8”))
File “/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 54, in preprocess_source
raise CompileError(“nvcc preprocessing of %s failed” % source_path,
pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmp92rwbt7s.cu failed
[command: nvcc --preprocess -arch sm_61 -I/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/cuda /tmp/tmp92rwbt7s.cu --compiler-options -P]
[stderr:
b"gcc: fatal error: cannot execute ‘cc1plus’: execvp: No such file or directory\ncompilation terminated.\nnvcc fatal : Failed to preprocess host compiler properties.\n"]

wtempel · January 27, 2023, 6:42pm

What resource manager/scheduler is running on your cluster?
Do all GPU nodes have nvidia driver v460 or later?

nturner · January 30, 2023, 9:41am

We are running slurm on our cluster

Yes, GPU nodes are Driver Version: 515.65.01

wtempel · January 31, 2023, 4:58pm

Please can you post queue_sub_script.sh from that job’s directory?

nturner · February 2, 2023, 1:07pm

#!/bin/sh
#SBATCH --export=ALL
#SBATCH -J cryosparc_P8_J11
#SBATCH -o /beegfs3/nturner/230122_K-CC-RWD_M-dH2_CHAPSO_UltrAufoil_Krios2_Falcon4_96kx/Cryosparc/CS-230122-k-cc-rwd-m-dh2-kriosii-falcon4-96-kx/J11/J11.out
#SBATCH -e /beegfs3/nturner/230122_K-CC-RWD_M-dH2_CHAPSO_UltrAufoil_Krios2_Falcon4_96kx/Cryosparc/CS-230122-k-cc-rwd-m-dh2-kriosii-falcon4-96-kx/J11/J11.err
#SBATCH -p gpu --gres gpu:4 --ntasks 1 --cpus-per-task 32 --mem 128G
#SBATCH --open-mode append
#SBATCH -t 7-00:00:00
#SBATCH --mail-type FAIL

export CUDA_VISIBLE_DEVICES="0,1,2,3"
export CRYOSPARC_SSD_PATH="/ssd/${SLURM_JOB_USER}-${SLURM_JOBID}"
/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/bin/cryosparcw run --project P8 --job J11 --master_hostname hal.lmb.internal --master_command_core_port 51532 > /beegfs3/nturner/230122_K-CC-RWD_M-dH2_CHAPSO_UltrAufoil_Krios2_Falcon4_96kx/Cryosparc/CS-230122-k-cc-rwd-m-dh2-kriosii-falcon4-96-kx/J11/job.log 2>&1

wtempel · February 2, 2023, 2:22pm

If I haven’t overlooked anything (please check), this script can be submitted with sbatch by the Linux user that runs the CryoSPARC services and should collect some details about the environment in which a CryoSPARC job would run.

#!/bin/sh
#SBATCH --export=ALL
#SBATCH -J cryosparc_env_tests
#SBATCH -o /beegfs3/nturner/230122_K-CC-RWD_M-dH2_CHAPSO_UltrAufoil_Krios2_Falcon4_96kx/Cryosparc/CS-230122-k-cc-rwd-m-dh2-kriosii-falcon4-96-kx/J11/test1.out
#SBATCH -e /beegfs3/nturner/230122_K-CC-RWD_M-dH2_CHAPSO_UltrAufoil_Krios2_Falcon4_96kx/Cryosparc/CS-230122-k-cc-rwd-m-dh2-kriosii-falcon4-96-kx/J11/test1.err
#SBATCH -p gpu --gres gpu:4 --ntasks 1 --cpus-per-task 32 --mem 128G
#SBATCH --open-mode append
#SBATCH -t 7-00:00:00
#SBATCH --mail-type FAIL
env | grep -v CRYOSPARC_LICENSE
echo '========================='
which g++
echo '========================='
g++ --version
echo '========================='
/lmb/home/nturner/Software/Cryosparc2/cryosparc_worker/bin/cryosparcw env | grep -v CRYOSPARC_LICENSE

Perhaps

/beegfs3/nturner/230122_K-CC-RWD_M-dH2_CHAPSO_UltrAufoil_Krios2_Falcon4_96kx/Cryosparc/CS-230122-k-cc-rwd-m-dh2-kriosii-falcon4-96-kx/J11/test1.out

would include some useful information after the script has run?