RTX 6000 Ada - device kernel image is invalid

We’re trying to set up our new RTX 6000 Ada server. Unfortunately, whenever a 2D classification job is run, we receive the following error

Traceback (most recent call last):
  File "/cryosparc-worker/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1048, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 192, in cryosparc_compute.engine.engine.EngineThread.setup_current_data_and_ctf
  File "cryosparc_master/cryosparc_compute/engine/cuda_kernels.py", line 1693, in cryosparc_compute.engine.cuda_kernels.compute_ctf
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 414, in cryosparc_compute.engine.cuda_core.context_dependent_memoize.wrapper
  File "cryosparc_master/cryosparc_compute/engine/cuda_kernels.py", line 1679, in cryosparc_compute.engine.cuda_kernels.get_compute_ctf_kernel
  File "/cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 294, in __init__
    self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - error   : Binary format for key='0', ident='' is not recognized

Switching the cuda version does not fix the issue, and other tools (such as Relion, tensorflow, and alphafold) have no problem with the GPUs. This is using v4.2.1.

EDIT: Reinstalling the worker does not fix the issue.
EDIT 2: This is a brand new system, never used with CryoSPARC previously

This issue occurs on driver 525 and 530. Driver 520 does not recognize the RTX 6000 Ada

Please can you post the outputs of these commands:

which nvcc
nvidia-smi
/sbin/ldconfig -p | grep -i cu
worker_path=/cryosparc-worker/cryosparc_worker
${worker_path}/bin/cryosparcw call /usr/bin/env | grep -ve LICENSE -ve SSH
${worker_path}/bin/cryosparcw call nvcc -V
${worker_path}/bin/cryosparcw call which nvcc
${worker_path}/bin/cryosparcw call python -c "import pycuda.driver; print(pycuda.driver.get_version())"

Here is the output:

which nvcc

/usr/local/cuda/bin/nvcc

nvidia-smi

Thu May 11 13:39:51 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX 6000 Ada Gener...    On | 00000000:01:00.0 Off |                  Off |
| 30%   32C    P8               27W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX 6000 Ada Gener...    On | 00000000:25:00.0 Off |                  Off |
| 30%   30C    P8               22W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX 6000 Ada Gener...    On | 00000000:41:00.0 Off |                  Off |
| 30%   31C    P8               26W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX 6000 Ada Gener...    On | 00000000:61:00.0 Off |                  Off |
| 30%   29C    P8               21W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA RTX 6000 Ada Gener...    On | 00000000:81:00.0 Off |                  Off |
| 30%   31C    P8               27W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA RTX 6000 Ada Gener...    On | 00000000:A1:00.0 Off |                  Off |
| 30%   28C    P8               27W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA RTX 6000 Ada Gener...    On | 00000000:C1:00.0 Off |                  Off |
| 30%   31C    P8               12W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA RTX 6000 Ada Gener...    On | 00000000:E1:00.0 Off |                  Off |
| 30%   29C    P8               27W / 300W|      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

/sbin/ldconfig -p | grep -i cu

	libwayland-cursor.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libwayland-cursor.so.0
	liburcu.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu.so.8
	liburcu-signal.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu-signal.so.8
	liburcu-qsbr.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu-qsbr.so.8
	liburcu-memb.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu-memb.so.8
	liburcu-mb.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu-mb.so.8
	liburcu-common.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu-common.so.8
	liburcu-cds.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu-cds.so.8
	liburcu-bp.so.8 (libc6,x86-64) => /lib/x86_64-linux-gnu/liburcu-bp.so.8
	libpcsamplingutil.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libpcsamplingutil.so
	libpcsamplingutil.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libpcsamplingutil.so
	libnvrtc.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvrtc.so.12
	libnvrtc.so.11.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.11.2
	libnvrtc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so
	libnvrtc.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvrtc.so
	libnvrtc-builtins.so.12.1 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvrtc-builtins.so.12.1
	libnvrtc-builtins.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.8
	libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so
	libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvrtc-builtins.so
	libnvperf_target.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_target.so
	libnvperf_target.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvperf_target.so
	libnvperf_host.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_host.so
	libnvperf_host.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvperf_host.so
	libnvjpeg.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvjpeg.so.12
	libnvjpeg.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so.11
	libnvjpeg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so
	libnvjpeg.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvjpeg.so
	libnvcuvid.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvcuvid.so.1
	libnvcuvid.so.1 (libc6) => /lib/i386-linux-gnu/libnvcuvid.so.1
	libnvcuvid.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvcuvid.so
	libnvcuvid.so (libc6) => /lib/i386-linux-gnu/libnvcuvid.so
	libnvblas.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvblas.so.12
	libnvblas.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so.11
	libnvblas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so
	libnvblas.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvblas.so
	libnvToolsExt.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so.1
	libnvToolsExt.so.1 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvToolsExt.so.1
	libnvToolsExt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so
	libnvToolsExt.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvToolsExt.so
	libnvJitLink.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvJitLink.so.12
	libnvJitLink.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnvJitLink.so
	libnpps.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnpps.so.12
	libnpps.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so.11
	libnpps.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so
	libnpps.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnpps.so
	libnppitc.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppitc.so.12
	libnppitc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so.11
	libnppitc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so
	libnppitc.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppitc.so
	libnppisu.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppisu.so.12
	libnppisu.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so.11
	libnppisu.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so
	libnppisu.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppisu.so
	libnppist.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppist.so.12
	libnppist.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so.11
	libnppist.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so
	libnppist.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppist.so
	libnppim.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppim.so.12
	libnppim.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so.11
	libnppim.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so
	libnppim.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppim.so
	libnppig.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppig.so.12
	libnppig.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so.11
	libnppig.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so
	libnppig.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppig.so
	libnppif.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppif.so.12
	libnppif.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so.11
	libnppif.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so
	libnppif.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppif.so
	libnppidei.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppidei.so.12
	libnppidei.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so.11
	libnppidei.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so
	libnppidei.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppidei.so
	libnppicc.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppicc.so.12
	libnppicc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so.11
	libnppicc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so
	libnppicc.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppicc.so
	libnppial.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppial.so.12
	libnppial.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so.11
	libnppial.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so
	libnppial.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppial.so
	libnppc.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppc.so.12
	libnppc.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so.11
	libnppc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so
	libnppc.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libnppc.so
	libncursesw.so.6 (libc6,x86-64) => /lib/x86_64-linux-gnu/libncursesw.so.6
	libncursesw.so.6 (libc6) => /lib32/libncursesw.so.6
	libncurses.so.6 (libc6,x86-64) => /lib/x86_64-linux-gnu/libncurses.so.6
	libncurses.so.6 (libc6) => /lib32/libncurses.so.6
	libncurses.so.5 (libc6,x86-64) => /lib/x86_64-linux-gnu/libncurses.so.5
	libicuuc.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicuuc.so.70
	libicuuc.so.70 (libc6) => /lib/i386-linux-gnu/libicuuc.so.70
	libicutu.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicutu.so.70
	libicutu.so.70 (libc6) => /lib/i386-linux-gnu/libicutu.so.70
	libicutest.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicutest.so.70
	libicutest.so.70 (libc6) => /lib/i386-linux-gnu/libicutest.so.70
	libicui18n.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicui18n.so.70
	libicui18n.so.70 (libc6) => /lib/i386-linux-gnu/libicui18n.so.70
	libicuio.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicuio.so.70
	libicuio.so.70 (libc6) => /lib/i386-linux-gnu/libicuio.so.70
	libicudata.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.70
	libicudata.so.70 (ELF) => /lib/i386-linux-gnu/libicudata.so.70
	libharfbuzz-icu.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libharfbuzz-icu.so.0
	libcusparse.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcusparse.so.12
	libcusparse.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.11
	libcusparse.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so
	libcusparse.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcusparse.so
	libcusolverMg.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so.11
	libcusolverMg.so.11 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcusolverMg.so.11
	libcusolverMg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so
	libcusolverMg.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcusolverMg.so
	libcusolver.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.11
	libcusolver.so.11 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcusolver.so.11
	libcusolver.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so
	libcusolver.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcusolver.so
	libcurl.so.4 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcurl.so.4
	libcurl-gnutls.so.4 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcurl-gnutls.so.4
	libcurand.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10
	libcurand.so.10 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcurand.so.10
	libcurand.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so
	libcurand.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcurand.so
	libcupti.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcupti.so.12
	libcupti.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so.11.8
	libcupti.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so
	libcupti.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcupti.so
	libcups.so.2 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcups.so.2
	libcuinj64.so.12.1 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcuinj64.so.12.1
	libcuinj64.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so.11.8
	libcuinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so
	libcuinj64.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcuinj64.so
	libcufile_rdma.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so.1
	libcufile_rdma.so.1 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufile_rdma.so.1
	libcufile_rdma.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so
	libcufile_rdma.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufile_rdma.so
	libcufile.so.0 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile.so.0
	libcufile.so.0 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufile.so.0
	libcufile.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile.so
	libcufile.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufile.so
	libcufftw.so.11 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufftw.so.11
	libcufftw.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so.10
	libcufftw.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so
	libcufftw.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufftw.so
	libcufft.so.11 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufft.so.11
	libcufft.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.10
	libcufft.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so
	libcufft.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcufft.so
	libcudart.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcudart.so.12
	libcudart.so.11.0 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.11.0
	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so
	libcudart.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcudart.so
	libcudadebugger.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudadebugger.so.1
	libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
	libcuda.so.1 (libc6) => /lib/i386-linux-gnu/libcuda.so.1
	libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so
	libcuda.so (libc6) => /lib/i386-linux-gnu/libcuda.so
	libcublasLt.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcublasLt.so.12
	libcublasLt.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.11
	libcublasLt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so
	libcublasLt.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcublasLt.so
	libcublas.so.12 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcublas.so.12
	libcublas.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.11
	libcublas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so
	libcublas.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcublas.so
	libcheckpoint.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcheckpoint.so
	libcheckpoint.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libcheckpoint.so
	libaccinj64.so.12.1 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libaccinj64.so.12.1
	libaccinj64.so.11.8 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so.11.8
	libaccinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so
	libaccinj64.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libaccinj64.so
	libXcursor.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libXcursor.so.1
	libOpenCL.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so.1
	libOpenCL.so.1 (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libOpenCL.so.1
	libOpenCL.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so
	libOpenCL.so (libc6,x86-64) => /usr/local/cuda-12/targets/x86_64-linux/lib/libOpenCL.so

${worker_path}/bin/cryosparcw call /usr/bin/env | grep -ve LICENSE -ve SSH

SHELL=/bin/bash
SUDO_GID=1000
CONDA_EXE=/cryosparc-worker/cryosparc_worker/deps/anaconda/bin/conda
_CE_M=
PYTHONNOUSERSITE=true
CRYOSPARC_USE_GPU=true
NUMEXPR_NUM_THREADS=1
CRYOSPARC_PATH=/cryosparc-worker/cryosparc_worker/bin
SUDO_COMMAND=/usr/bin/su
SUDO_USER=admin-user
PWD=/cryosparc-worker/cryosparc_worker
LOGNAME=root
CONDA_PREFIX=/cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env
LD_PRELOAD=/cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libpython3.8.so
HOME=/root
LANG=en_US.UTF-8
CRYOSPARC_ROOT_DIR=/cryosparc-worker/cryosparc_worker
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.webp=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
CONDA_PROMPT_MODIFIER=(cryosparc_worker_env) 
LESSCLOSE=/usr/bin/lesspipe %s %s
PYTHONPATH=/cryosparc-worker/cryosparc_worker
TERM=xterm-256color
_CE_CONDA=
LESSOPEN=| /usr/bin/lesspipe %s
USER=root
CRYOSPARC_CUDA_PATH=/usr/local/cuda-11.8
CONDA_SHLVL=1
CRYOSPARC_DEVELOP=false
SHLVL=0
CONDA_PYTHON_EXE=/cryosparc-worker/cryosparc_worker/deps/anaconda/bin/python
LD_LIBRARY_PATH=/cryosparc-worker/cryosparc_worker/deps/external/cudnn/lib
CONDA_DEFAULT_ENV=cryosparc_worker_env
OMP_NUM_THREADS=1
PATH=/cryosparc-worker/cryosparc_worker/bin:/cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/cryosparc-worker/cryosparc_worker/deps/anaconda/condabin:/root/anaconda3/condabin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
SUDO_UID=1000
MKL_NUM_THREADS=1
MAIL=/var/mail/root
CRYOSPARC_CONDA_ENV=cryosparc_worker_env
OLDPWD=/cryosparc-worker/cryosparc_worker

${worker_path}/bin/cryosparcw call nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

${worker_path}/bin/cryosparcw call which nvcc

/cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin/nvcc

${worker_path}/bin/cryosparcw call python -c “import pycuda.driver; print(pycuda.driver.get_version())”

(11, 8, 0)

Thanks @UCBKurt.
Please can you also post the output of this command:

/cryosparc-worker/cryosparc_worker/bin/cryosparcw call python -c $'import time;from pycuda import driver;from pycuda.compiler import SourceModule;driver.init();ctx = driver.Device(0).retain_primary_context();ctx.push()\ntry:print(SourceModule("__global__ void f(float *a, float val) { }").get_function("f"))\nexcept e: print(e)\nfinally:ctx.pop();time.sleep(10)' & \
  (CSPID=$! && sleep 5 && cat /proc/${CSPID}/maps | grep cu)

Sorry for the delay, here is the output:

pycuda._driver.Function object at 0x7f02e9d6b0b0>
7f02e3400000-7f02e8072000 r-xp 00000000 08:12 48005144                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libcurand.so.10.3.0.86
7f02e8072000-7f02e8272000 ---p 04c72000 08:12 48005144                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libcurand.so.10.3.0.86
7f02e8272000-7f02e8278000 r--p 04c72000 08:12 48005144                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libcurand.so.10.3.0.86
7f02e8278000-7f02e96a4000 rw-p 04c78000 08:12 48005144                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libcurand.so.10.3.0.86
7f02e9cdc000-7f02e9cdd000 rw-p 060a4000 08:12 48005144                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libcurand.so.10.3.0.86
7f02e9e00000-7f02e9edc000 r--p 00000000 08:12 1085705                    /usr/lib/x86_64-linux-gnu/libcuda.so.530.30.02
7f02e9edc000-7f02ea382000 r-xp 000dc000 08:12 1085705                    /usr/lib/x86_64-linux-gnu/libcuda.so.530.30.02
7f02ea382000-7f02eb967000 r--p 00582000 08:12 1085705                    /usr/lib/x86_64-linux-gnu/libcuda.so.530.30.02
7f02eb967000-7f02eb968000 ---p 01b67000 08:12 1085705                    /usr/lib/x86_64-linux-gnu/libcuda.so.530.30.02
7f02eb968000-7f02eb97f000 r--p 01b67000 08:12 1085705                    /usr/lib/x86_64-linux-gnu/libcuda.so.530.30.02
7f02eb97f000-7f02eba85000 rw-p 01b7e000 08:12 1085705                    /usr/lib/x86_64-linux-gnu/libcuda.so.530.30.02
7f02ebc06000-7f02ebcdc000 r--p 00000000 08:12 48377361                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so
7f02ebcdc000-7f02ebd6f000 r-xp 000d6000 08:12 48377361                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so
7f02ebd6f000-7f02ebdab000 r--p 00169000 08:12 48377361                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so
7f02ebdab000-7f02ebdb6000 r--p 001a5000 08:12 48377361                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so
7f02ebdb6000-7f02ebdbf000 rw-p 001b0000 08:12 48377361                   /cryosparc-worker/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/_driver.cpython-38-x86_64-linux-gnu.so

We have so far not been able to determine a reliable cure for the cuModuleLoadDataEx failed problem. That said, my next suggestion would be to run CryoSPARC in a simplified environment.
Caution: Following this suggestion is not guaranteed to establish the desired CryoSPARC functionality and may adversely affect other functionality. Moreover, according to the related post 2d classification kernel error - #8 by sheff_diamond_em, simplification was apparently not necessary to resolve cuModuleLoadDataEx failed.
But, in case you want to try it:

  1. Omit cuda directories in PATH or LD_LIBRARY_PATH definitions (outside the cryosparcw environment). cryosparcw alone should handle inclusion of cuda-related directories on those variables’ definitions.
  2. /sbin/ldconfig -p output should not include any libraries inside cuda directories. I would I aim for output like this (note the absence of libraries under /usr/local/cuda):
$ $ /sbin/ldconfig -p|grep -e libcu -e cuda
	libicudata.so.70 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.70
	libcurl.so.4 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcurl.so.4
	libcurl-gnutls.so.4 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcurl-gnutls.so.4
	libcudadebugger.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcudadebugger.so.1
	libcuda.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so.1
	libcuda.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libcuda.so

I wonder whether the presence of i386 libraries could be a problem (it apparently wasn’t in the aforementioned related post):

See ubuntu or Linux documentation for information on ldconfig and related configuration files.
To keep track of experimental changes to configuration files, you may find etckeeper helpful.

So, it’s the exact same error after simplifying everything. We need the i386 libraries due to another application, so removing them isn’t possible. Also, we don’t have this issue on other machines using i386 libraries. Only on our RTX 6000 Ada and RTX 4000 SFF machines. Could the issue just be that these GPUs require CUDA 12 (since driver 520 doesn’t even recognize them)?

According to nVidia’s documentation, Ada is officially support with CUDA 11.6 (with a supported driver) so I don’t think that’s the issue.

The RTX 6000 Ada has only been supported since driver 525, so 520 not identifying it is unsurprising.

2D classification is failing? So motion correction, CTF estimation and blob picking work and provide results which look appropriate? If you just feed the particles to ab initio or 3D (with [pick a reference]) does it also crash?

How interesting. Other documentation suggests that v11.8 is needed. It would be interesting to hear from other users about their experience with Hopper or Ada Lovelace devices.

Just tested and CUDA 11.6 and 11.7 fail to recognize the GPUs. Seems like 11.8 is the oldest that will work.

2D classification is failing? So motion correction, CTF estimation and blob picking work and provide results which look appropriate? If you just feed the particles to ab initio or 3D (with [pick a reference]) does it also crash?

Nothing else outright fails, but local refinement randomly freezes partway through. Doesn’t seem to be consistent when it happens. The freezing happens on different datasets as well, not just one specific one.

Does this happen with caching enabled and a local nvme cache device?
Is there plenty of RAM “available” when that happens?

Does this happen with caching enabled and a local nvme cache device?

Yes, the server has 16TB of U.3 NVMe dedicated specifically for CryoSPARC

Is there plenty of RAM “available” when that happens?

Yes, free says there is over 500GB of memory free and available for use

Thanks for this information. What does the Event Log show at that point? Does the job log (Metadata | Log) show

  • any errors
  • evidence for continuing heartbeats?

No errors. The heartbeats continue to show up, but there are no more cufft logs in the joblog and the web ui doesn’t update.

It hung at Computing FFTs on GPU. this time, but I’ve seen it hang at Non-uniform regularization with compute option: GPU at other points.

Yes, that confused me as well because I was sure 11.8 was the only CUDA 11 with official support for Lovelace. I guess PR and development didn’t communicate well about that…

@UCBKurt Random hangs doesn’t sound like a CryoSPARC issue directly, but a system issue which CryoSPARC is managing to make appear. What do you see in system logs? I’m thinking explicitly about PCI-E bus errors, memory errors or the GPUs “dropping off the bus”. If you’re not getting problems consistently on everything which CryoSPARC is doing for the GPU… hm. I recently had a problem where CryoSPARC was the only thing which had issues… turned out it was two bad sticks of RAM, where the ECC had managed to cover it up during early testing but after a few months of heavy use they became uncorrectable.

It’s times like these I fall back to stress testing methodologies. Prime95, y-cruncher for CPU/RAM, memtest, UniEngine Heaven/RTHDLIBL for the GPUs…

1 Like

Stress testing is actually what I tried first. Didn’t find any hardware issues, plus it’s an entirely brand new machine. Only local refinement and 2d classification have issues on it.

Is there any ETA on when this will be addressed? Either through a patch or pycuda update? CryoSPARC uses pycuda 2020.1 which may be another reason why it fails.

@wtempel @rbs_sci We got our hands on a 4090 and can confirm that it does not have any issues with 11.8 and CryoSPARC. The problem happens exclusively with the RTX 4000 SFF and the RTX 6000 Ada.

Please advise on when Cuda 12 support will be added to CryoSPARC. Right now our RTX 6000 Ada server is mostly useless since CryoSPARC can’t run on it.

We do not currently have an ETA for CUDA v12 support in CryoSPARC. I will send you a direct message with another suggestion based on the specific circumstances you described so far.