CUDA 11.4 error

emmarl25 · August 16, 2022, 8:28pm

Hi,
Our institution recently had a display port error that was fixed with rebooting and updating our NVIDIA driver to 470.141.03 from a >400 version. I made no changes to our CUDA installation, but with NVIDIA 470 our running CUDA version is 11.4 (see screenshot below), which is incompatible with cryoSPARC. However, the output of nvcc --version displays CUDA 10.1.

I tried downgrading the NVIDIA driver to 450 and then 390, but those versions have communication errors when installed as they’re outdated. Any help at all will be deeply appreciated. Thanks in advance!

wtempel · August 16, 2022, 10:13pm

It is possible for the CUDA versions of the driver and the cryoSPARC-configured toolkit (configured via CRYOSPARC_CUDA_PATH in /path/to/cryosparc_worker/config.sh) to differ.
In the absence of any related cryoSPARC error, I see no reason in this case to downgrade to a driver version that corresponds to the version of the toolkit associated with CRYOSPARC_CUDA_PATH.
This is my interpretation of backward compatibility discussed elsewhere.
You may wish to validate the combination of CUDA-10.1 CRYSPARC_CUDA_PATH and CUDA-11.4 driver with a test workflow like this.

emmarl25 · August 16, 2022, 10:34pm

Apologies, I should have specified; with this current setup, we were able to run jobs on the GPU but they eventually failed with the error that the gpu architecture is not recognized. I thought it was because our CUDA version was too high, but so far our machine hasn’t been able to tolerate NVIDIA drivers/CUDA configurations lower than 470.

wtempel · August 17, 2022, 3:12am

Please can you share the details of your current setup using these commands:

and paste the error message(s) observed with that setup. Please also specify the corresponding job type.

emmarl25 · August 17, 2022, 2:17pm

Hi, thanks again for your response. Here are the outputs for the commands you sent, although how could I find the cryosparc CUDA path again? Would that strictly be inside the cryosparc root directory? (btw I did have to update the NVIDIA driver again because we were having NVIDIA communication errors with the 470.) Thanks so much again!

emmarl25 · August 17, 2022, 2:22pm

Here also is a screenshot of the error that is returned whenever we run a job on the worker node:

wtempel · August 17, 2022, 2:54pm

Thank you for sharing the error message. This could be a case where the GPU hardware requires version 11 or newer of the CUDA toolkit.
We currently recommend CUDA-11.2 because we haven’t tested newer minor versions of the toolkit.
If you opt for a toolkit version newer than 11.2, I strongly recommend validating cryoSPARC with a test workflow.
The cryosparc_worker installation needs to be updated with the
/path/to/cryosparc_worker/bin/cryosparcw newcuda /path/to/new_cuda_toolkit_root
command every time changes are made to the location or content of
CRYOSPARC_CUDA_PATH (defined in /path/to/cryosparc_worker/config.sh).

EDIT: I just now realized, based on the output of
python -c "import pycuda.driver; print(pycuda.driver.get_version())", that the /home/voslab/cryosparc/cryosparc_worker installation appears to already have been configured with CUDA-11.2, but not 10.1, as I had expected based on

What are the outputs of commands 2 and 3 from this sequence:

eval $(/home/voslab/cryosparc/cryosparc_worker/bin/cryosparcw env)
echo $CRYOSPARC_CUDA_PATH
${CRYOSPARC_CUDA_PATH}/bin/nvcc --version

emmarl25 · August 17, 2022, 3:18pm

I think I see the issue; the cryosparc worker CUDA path is pointing to ‘/usr/local/bin/cuda-11.2’, which doesn’t exist. How do you best recommend installing CUDA 11.2 as opposed to CUDA 11.6? I previously used the package manager to install the NVIDIA-CUDA toolkit, which automatically installed CUDA 11.6.

emmarl25 · August 17, 2022, 3:39pm

Hi, here are the outputs for those commands.

wtempel · August 17, 2022, 3:53pm

Please review installer documentation and your existing directory structure/contents to ensure the following suggestion is suitable for your circumstances.
As user voslab, not root:

cd
(single line command)
wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run

bash cuda_11.2.2_460.32.03_linux.run --silent --toolkit \ 
--toolkitpath=/home/voslab/cryosparc/cuda-11.2.2 \
--defaultroot=/home/voslab/cryosparc/cuda-11.2.2

/home/voslab/cryosparc/cryosparc_worker/cryosparcw \
newcuda /home/voslab/cryosparc/cuda-11.2.2

emmarl25 · August 17, 2022, 4:26pm

I’m able to get the runfile successfully, but am having trouble running it at your third step, returning the following error:

wtempel · August 17, 2022, 5:32pm

I am not sure what the problem might be.
Has the build-essential package been installed on this system?
More information should become available if the command were repeated without the --silent flag.

emmarl25 · August 17, 2022, 6:38pm

Since that command wasn’t working, I used wget to save the cuda 11.2 runfile to /home/voslab/ and installed using ‘sudo sh cuda_11.2.2_460.32.03_linux.run --toolkit --silent --override’. Then I assigned the CRYOSPARC_CUDA_PATH to the installed cuda, and cryosparc is able to complete jobs now! Thank you so much again for all of your help.