Hi,
Our institution recently had a display port error that was fixed with rebooting and updating our NVIDIA driver to 470.141.03 from a >400 version. I made no changes to our CUDA installation, but with NVIDIA 470 our running CUDA version is 11.4 (see screenshot below), which is incompatible with cryoSPARC. However, the output of nvcc --version displays CUDA 10.1.
I tried downgrading the NVIDIA driver to 450 and then 390, but those versions have communication errors when installed as they’re outdated. Any help at all will be deeply appreciated. Thanks in advance!
It is possible for the CUDA versions of the driver and the cryoSPARC-configured toolkit (configured via CRYOSPARC_CUDA_PATH in /path/to/cryosparc_worker/config.sh) to differ.
In the absence of any related cryoSPARC error, I see no reason in this case to downgrade to a driver version that corresponds to the version of the toolkit associated with CRYOSPARC_CUDA_PATH.
This is my interpretation of backward compatibility discussed elsewhere.
You may wish to validate the combination of CUDA-10.1 CRYSPARC_CUDA_PATH and CUDA-11.4 driver with a test workflow like this.
Apologies, I should have specified; with this current setup, we were able to run jobs on the GPU but they eventually failed with the error that the gpu architecture is not recognized. I thought it was because our CUDA version was too high, but so far our machine hasn’t been able to tolerate NVIDIA drivers/CUDA configurations lower than 470.
Hi, thanks again for your response. Here are the outputs for the commands you sent, although how could I find the cryosparc CUDA path again? Would that strictly be inside the cryosparc root directory? (btw I did have to update the NVIDIA driver again because we were having NVIDIA communication errors with the 470.) Thanks so much again!
Thank you for sharing the error message. This could be a case where the GPU hardware requires version 11 or newer of the CUDA toolkit.
We currently recommend CUDA-11.2 because we haven’t tested newer minor versions of the toolkit.
If you opt for a toolkit version newer than 11.2, I strongly recommend validating cryoSPARC with a test workflow.
The cryosparc_worker installation needs to be updated with the /path/to/cryosparc_worker/bin/cryosparcw newcuda /path/to/new_cuda_toolkit_root
command every time changes are made to the location or content of CRYOSPARC_CUDA_PATH (defined in /path/to/cryosparc_worker/config.sh).
EDIT: I just now realized, based on the output of python -c "import pycuda.driver; print(pycuda.driver.get_version())", that the /home/voslab/cryosparc/cryosparc_worker installation appears to already have been configured with CUDA-11.2, but not 10.1, as I had expected based on
What are the outputs of commands 2 and 3 from this sequence:
I think I see the issue; the cryosparc worker CUDA path is pointing to ‘/usr/local/bin/cuda-11.2’, which doesn’t exist. How do you best recommend installing CUDA 11.2 as opposed to CUDA 11.6? I previously used the package manager to install the NVIDIA-CUDA toolkit, which automatically installed CUDA 11.6.
Please review installer documentation and your existing directory structure/contents to ensure the following suggestion is suitable for your circumstances.
As user voslab, notroot:
cd
(single line command) wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
I am not sure what the problem might be.
Has the build-essential package been installed on this system?
More information should become available if the command were repeated without the --silent flag.
Since that command wasn’t working, I used wget to save the cuda 11.2 runfile to /home/voslab/ and installed using ‘sudo sh cuda_11.2.2_460.32.03_linux.run --toolkit --silent --override’. Then I assigned the CRYOSPARC_CUDA_PATH to the installed cuda, and cryosparc is able to complete jobs now! Thank you so much again for all of your help.