Cuda_runtime.h: No such file or directory

mcahn · July 22, 2022, 7:57pm

I’ve installed CryoSPARC 3.3.2 on RHEL 8.5 (really Springdale 8.5, but that’s derived from RHEL 8.5). I’m using cudatoolkit 11.1. When I run the Patch Motion Correction (multi) in the tutorial, I get this:

pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmpupyjujj0.cu failed
[command: nvcc --preprocess -arch sm_80 -I/projects/MOLBIO/local/cryosparc-della-test-2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/cuda /tmp/tmpupyjujj0.cu --compiler-options -P]
[stderr:
b'cc1plus: fatal error: cuda_runtime.h: No such file or directory\ncompilation terminated.\n']

I tried adding several things to cluster_script.sh, one at a time, which did not help.

export CPLUS_INCLUDE_PATH="/usr/local/cuda-11.1/include"

export CPATH="/usr/local/cuda-11.1/include"

module load gcc-toolset/10

Finally I copied the contents of the cuda include directory to the location that nvcc is looking:

cp -r /usr/local/cuda-11.1/include/*  /projects/MOLBIO/local/cryosparc-della-test-2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/cuda

and that works, although it doesn’t seem like a good solution. How can one pursuade nvcc to look in the right place for the include files?

– Matthew Cahn

wtempel · July 26, 2022, 9:55pm

What is the output of the following commands, executed on a cryoSPARC worker node:

mcahn · July 27, 2022, 8:26pm

# salloc -t 00:10:00 -p cryoem --nodelist=della-l09g1
[cryoem@della-l09g1 cryosparc-della-test-2]$ eval $(cryosparc_worker/bin/cryosparcw env)
[cryoem@della-l09g1 cryosparc-della-test-2]$ ${CRYOSPARC_CUDA_PATH}/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Tue_Sep_15_19:10:02_PDT_2020
Cuda compilation tools, release 11.1, V11.1.74
Build cuda_11.1.TC455_06.29069683_0
[cryoem@della-l09g1 cryosparc-della-test-2]$ python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 1, 0)

wtempel · August 23, 2022, 3:14pm

I am still unclear about the root cause of this particular problem.
Your best bet may be to keep the environment at the time of installation as similar as possible to the environment at the time of running a cryoSPARC job.
The critical parts are:

the nvidia driver (controlled by the root user)
the nvidia toolkit (potentially controlled by a non-root user, see --toolkit, --toolkitpath=, --defaultroot= cuda installation options)
installation/(re-)configuration of the cryoSPARC worker package (when one runs cryosparc_worker/install.sh .. or cryosparc_worker/bin/cryosparcw newcuda ..)

This may imply in your situation:

installing the CUDA toolkit and the cryoSPARC worker package as a cluster job.
sharing CUDA toolkit and cryoSPARC worker installation trees only between cluster nodes with “similar enough” (intentionally vague) nvidia drivers.
updating the CUDA toolkit whenever the nvidia driver has “significantly” (intentionally vague) changed.
in turn, running cryosparcw newcuda /path/to/new/cuda whenever the CUDA toolkit installation has changed.
ensuring the same compilers and libraries are available at installation/reconfiguration and cryoSPARC run time (to which you have already alluded).

mcahn · January 3, 2023, 6:52pm

Although this problem persisted through version 4.1.0, in 4.1.1 it appears to be resolved. I ran the update to 4.1.1, and I’m able to run a patch motion correction job without having to copy the header files the way I did before.

– Matthew

wtempel · January 3, 2023, 7:37pm

Thank you for this update. May I ask if 3DFlex dependencies were installed for your CryoSPARC instance?