Hello. I recently updated to v4.1.2 and started getting some CUDA errors. I upgraded our NVIDIA drivers and system CUDA and linked the cryosparc workers to the new CUDA, and now I get the following error when trying to start jobs. The job here is an NU refine, but it happens elsewhere too.
Errror:
[CPU: 5.58 GB Avail: 112.26 GB]
Traceback (most recent call last):
File "/troll/scratch/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook
run_old(*args, **kw)
File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2441, in cryosparc_compute.engine.newengine.process.work
File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2492, in cryosparc_compute.engine.newengine.process.work
File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 1250, in cryosparc_compute.engine.newengine.EngineThread.compute_resid_pow
File "cryosparc_master/cryosparc_compute/engine/newcuda_kernels.py", line 6539, in cryosparc_compute.engine.newcuda_kernels.compute_resid_pow
File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 416, in cryosparc_compute.engine.cuda_core.context_dependent_memoize.wrapper
File "cryosparc_master/cryosparc_compute/engine/newcuda_kernels.py", line 6469, in cryosparc_compute.engine.newcuda_kernels.get_compute_resid_pow_kernel
File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 290, in __init__
cubin = compile(source, nvcc, options, keep, no_extern_c,
File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 135, in compile_plain
raise CompileError("nvcc compilation of %s failed" % cu_file_path,
pycuda.driver.CompileError: nvcc compilation of /tmp/tmp34r97384/kernel.cu failed
[command: nvcc --cubin -arch sm_75 -I/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/cuda kernel.cu]
[stderr:
kernel.cu(560): error: texture is not a template
kernel.cu(795): error: no instance of overloaded function "tex3D" matches the argument list
argument types are: (<error-type>, float, float, float)
2 errors detected in the compilation of "kernel.cu".
]
cryoSPARC info:
cryosparc@troll:/home/users/posert$ cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/troll/scratch/cryosparc/cryosparc_master
Current cryoSPARC version: v4.1.2
----------------------------------------------------------------------------
CryoSPARC process status:
app RUNNING pid 3597, uptime 3:02:33
app_api RUNNING pid 3629, uptime 3:02:31
app_api_dev STOPPED Not started
app_legacy STOPPED Not started
app_legacy_dev STOPPED Not started
command_core RUNNING pid 3213, uptime 3:02:46
command_rtp RUNNING pid 3298, uptime 3:02:37
command_vis RUNNING pid 3289, uptime 3:02:39
database RUNNING pid 2922, uptime 3:02:49
----------------------------------------------------------------------------
License is valid
----------------------------------------------------------------------------
global config variables:
export CRYOSPARC_LICENSE_ID="{redacted}"
export CRYOSPARC_MASTER_HOSTNAME="troll"
export CRYOSPARC_DB_PATH="/troll/scratch/cryosparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true
system config:
cryosparc@troll:/home/users/posert$ nvidia-smi
Mon Feb 6 18:45:06 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:05:00.0 Off | N/A |
| 23% 39C P8 2W / 215W | 339MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:06:00.0 Off | N/A |
| 20% 36C P8 2W / 215W | 6MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... On | 00000000:09:00.0 Off | N/A |
| 20% 35C P8 6W / 215W | 6MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... On | 00000000:0A:00.0 Off | N/A |
| 20% 31C P8 19W / 215W | 6MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2604 G /usr/lib/xorg/Xorg 327MiB |
| 0 N/A N/A 2896 G /usr/bin/gnome-shell 9MiB |
| 1 N/A N/A 2604 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 2604 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 2604 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
cryosparc@troll:/home/users/posert$ uname -a
Linux troll 5.15.0-58-generic #64~20.04.1-Ubuntu SMP Fri Jan 6 16:42:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
cryosparc@troll:/home/users/posert$ cryosparcw call nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
What I’ve tried:
- Installing a system
cuda-toolkit
- Re-installing the cryosparc worker
.tar.gz
- Running
cryosparcw install-3dflex