CryoSPARC Discuss

NVRTC Error: 5 with RTX 4070 after updating to 4.4.1

Troubleshooting

errinaceus March 5, 2024, 2:29pm 1

Hello. I recently updated to 4.4.1 and one of two gpus stopped working. I have RTX3090 and RTX4070 in the system, using RTX4070 in every job leads to an error. It worked well with 4.2.1. GPU is functional and works in other applications without problem. Any suggestions?
Here is terminal output from GPUtest job:

[CPU: 231.3 MB Avail: 120.32 GB]

Obtaining GPU info via nvidia-smi…
[CPU: 231.3 MB Avail: 120.33 GB]

NVIDIA GeForce RTX 3090 @ 00000000:01:00.0
[CPU: 231.3 MB Avail: 120.33 GB]

driver_version                :535.161.07

[CPU: 231.3 MB Avail: 120.33 GB]

persistence_mode              :Disabled

[CPU: 231.3 MB Avail: 120.33 GB]

power_limit                   :350.00

[CPU: 231.3 MB Avail: 120.33 GB]

sw_power_limit                :Not Active

[CPU: 231.3 MB Avail: 120.33 GB]

hw_power_limit                :Not Active

[CPU: 231.3 MB Avail: 120.33 GB]

compute_mode                  :Default

[CPU: 231.3 MB Avail: 120.33 GB]

max_pcie_link_gen             :4

[CPU: 231.3 MB Avail: 120.33 GB]

current_pcie_link_gen         :4

[CPU: 231.3 MB Avail: 120.33 GB]

temperature                   :53

[CPU: 231.3 MB Avail: 120.33 GB]

gpu_utilization               :19

[CPU: 231.3 MB Avail: 120.33 GB]

memory_utilization            :1

[CPU: 231.3 MB Avail: 120.33 GB]

NVIDIA GeForce RTX 4070 @ 00000000:06:00.0
[CPU: 231.3 MB Avail: 120.33 GB]

driver_version                :535.161.07

[CPU: 231.3 MB Avail: 120.33 GB]

persistence_mode              :Disabled

[CPU: 231.3 MB Avail: 120.33 GB]

power_limit                   :200.00

[CPU: 231.3 MB Avail: 120.33 GB]

sw_power_limit                :Not Active

[CPU: 231.3 MB Avail: 120.33 GB]

hw_power_limit                :Not Active

[CPU: 231.3 MB Avail: 120.33 GB]

compute_mode                  :Default

[CPU: 231.3 MB Avail: 120.33 GB]

max_pcie_link_gen             :3

[CPU: 231.3 MB Avail: 120.33 GB]

current_pcie_link_gen         :1

[CPU: 231.3 MB Avail: 120.33 GB]

temperature                   :38

[CPU: 231.3 MB Avail: 120.33 GB]

gpu_utilization               :0

[CPU: 231.3 MB Avail: 120.33 GB]

memory_utilization            :0

[CPU: 319.5 MB Avail: 120.25 GB]

Starting GPU test on: NVIDIA GeForce RTX 4070 @ 6
[CPU: 319.5 MB Avail: 120.25 GB]

With CUDA Toolkit version: 11.8

[CPU: 333.9 MB Avail: 120.25 GB]

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main
File “/home/k305-1/CryoSPARC/cryosparc_worker/cryosparc_compute/jobs/instance_testing/run.py”, line 175, in run_gpu_job
func = mod.get_function(“add”)
File “/home/k305-1/CryoSPARC/cryosparc_worker/cryosparc_compute/gpu/compiler.py”, line 214, in get_function
cufunc = self.get_module().get_function(name)
File “/home/k305-1/CryoSPARC/cryosparc_worker/cryosparc_compute/gpu/compiler.py”, line 170, in get_module
linker.add_cu(s, k)
File “/home/k305-1/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 3013, in add_cu
program = NvrtcProgram(cu, name)
File “/home/k305-1/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 2900, in init
self.check(err)
File “/home/k305-1/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 2945, in check
raise RuntimeError(‘NVRTC Error: {}’.format(err))
RuntimeError: NVRTC Error: 5

wtempel March 5, 2024, 7:47pm 2

@errinaceus Please can you post the output of the command
nvidia-smi.

errinaceus March 5, 2024, 9:25pm 3

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:01:00.0  On |                  N/A |
| 30%   57C    P5              57W / 350W |    528MiB / 24576MiB |     15%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4070        Off | 00000000:06:00.0 Off |                  N/A |
|  0%   46C    P8              17W / 200W |      8MiB / 12282MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                        
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1709      G   /usr/lib/xorg/Xorg                          268MiB |
|    0   N/A  N/A      2266      G   /usr/bin/gnome-shell                         42MiB |
|    0   N/A  N/A      5104      G   ...irefox/3836/usr/lib/firefox/firefox      201MiB |
|    0   N/A  N/A     26897      G   /usr/bin/nvidia-settings                      0MiB |
|    1   N/A  N/A      1709      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

errinaceus March 6, 2024, 10:20am 4

Well, I solved the issue by myself. The problem was in incorrect system LD_LIBRARY_PATH which was pointing to libraries of old CUDA 11.7 installation, which of course does not fully support RTX 40 series. I do not understand why built-in-cryosparc cuda does not satisfy these requirements and still depends on system libraries?
Maybe this will be helpful to someone else: to not make a mess with different CUDAs, I installed newest 12.3 in conda environment and replaced LD_LIBRARY_PATH with PATH to these libraries
LD_LIBRARY_PATH={CONDA_PREFIX}/lib/python3.9/site-packages/nvidia/cudnn/lib:{CONDA_PREFIX}/lib/:$LD_LIBRARY_PATH

wtempel March 6, 2024, 3:44pm 5

Through which mechanism/configuration file was this CUDA-11.7 installation added to LD_LIBRARY_PATH?

To which specific system libraries does your question refer?

errinaceus March 6, 2024, 5:59pm 6

It was added manually during first installation, and symlinks to usr/local/cuda were not updated during installation of newer version.

I mean that cryosparc still depends on system cuda libraries, and I do not understand why.

wtempel March 6, 2024, 7:32pm 7

“Manual” addition of CUDA toolkit libraries to LD_LIBRARY_PATH is not required for a typical CryoSPARC v4.4.1 installation and may break CryoSPARC function.

Which specific files did you find your CryoSPARC installation to depend on?