Hello. I recently updated to 4.4.1 and one of two gpus stopped working. I have RTX3090 and RTX4070 in the system, using RTX4070 in every job leads to an error. It worked well with 4.2.1. GPU is functional and works in other applications without problem. Any suggestions?
Here is terminal output from GPUtest job:
[CPU: 231.3 MB Avail: 120.32 GB]
Obtaining GPU info via nvidia-smi
…
[CPU: 231.3 MB Avail: 120.33 GB]
NVIDIA GeForce RTX 3090 @ 00000000:01:00.0
[CPU: 231.3 MB Avail: 120.33 GB]
driver_version :535.161.07
[CPU: 231.3 MB Avail: 120.33 GB]
persistence_mode :Disabled
[CPU: 231.3 MB Avail: 120.33 GB]
power_limit :350.00
[CPU: 231.3 MB Avail: 120.33 GB]
sw_power_limit :Not Active
[CPU: 231.3 MB Avail: 120.33 GB]
hw_power_limit :Not Active
[CPU: 231.3 MB Avail: 120.33 GB]
compute_mode :Default
[CPU: 231.3 MB Avail: 120.33 GB]
max_pcie_link_gen :4
[CPU: 231.3 MB Avail: 120.33 GB]
current_pcie_link_gen :4
[CPU: 231.3 MB Avail: 120.33 GB]
temperature :53
[CPU: 231.3 MB Avail: 120.33 GB]
gpu_utilization :19
[CPU: 231.3 MB Avail: 120.33 GB]
memory_utilization :1
[CPU: 231.3 MB Avail: 120.33 GB]
NVIDIA GeForce RTX 4070 @ 00000000:06:00.0
[CPU: 231.3 MB Avail: 120.33 GB]
driver_version :535.161.07
[CPU: 231.3 MB Avail: 120.33 GB]
persistence_mode :Disabled
[CPU: 231.3 MB Avail: 120.33 GB]
power_limit :200.00
[CPU: 231.3 MB Avail: 120.33 GB]
sw_power_limit :Not Active
[CPU: 231.3 MB Avail: 120.33 GB]
hw_power_limit :Not Active
[CPU: 231.3 MB Avail: 120.33 GB]
compute_mode :Default
[CPU: 231.3 MB Avail: 120.33 GB]
max_pcie_link_gen :3
[CPU: 231.3 MB Avail: 120.33 GB]
current_pcie_link_gen :1
[CPU: 231.3 MB Avail: 120.33 GB]
temperature :38
[CPU: 231.3 MB Avail: 120.33 GB]
gpu_utilization :0
[CPU: 231.3 MB Avail: 120.33 GB]
memory_utilization :0
[CPU: 319.5 MB Avail: 120.25 GB]
Starting GPU test on: NVIDIA GeForce RTX 4070 @ 6
[CPU: 319.5 MB Avail: 120.25 GB]
With CUDA Toolkit version: 11.8
[CPU: 333.9 MB Avail: 120.25 GB]
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main
File “/home/k305-1/CryoSPARC/cryosparc_worker/cryosparc_compute/jobs/instance_testing/run.py”, line 175, in run_gpu_job
func = mod.get_function(“add”)
File “/home/k305-1/CryoSPARC/cryosparc_worker/cryosparc_compute/gpu/compiler.py”, line 214, in get_function
cufunc = self.get_module().get_function(name)
File “/home/k305-1/CryoSPARC/cryosparc_worker/cryosparc_compute/gpu/compiler.py”, line 170, in get_module
linker.add_cu(s, k)
File “/home/k305-1/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 3013, in add_cu
program = NvrtcProgram(cu, name)
File “/home/k305-1/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 2900, in init
self.check(err)
File “/home/k305-1/CryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py”, line 2945, in check
raise RuntimeError(‘NVRTC Error: {}’.format(err))
RuntimeError: NVRTC Error: 5