Kernel.cu(560): error: texture is not a template after updating to v4.1.2

Hello. I recently updated to v4.1.2 and started getting some CUDA errors. I upgraded our NVIDIA drivers and system CUDA and linked the cryosparc workers to the new CUDA, and now I get the following error when trying to start jobs. The job here is an NU refine, but it happens elsewhere too.

Errror:

 [CPU:   5.58 GB  Avail: 112.26 GB]

Traceback (most recent call last):
  File "/troll/scratch/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook
    run_old(*args, **kw)
  File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2441, in cryosparc_compute.engine.newengine.process.work
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2492, in cryosparc_compute.engine.newengine.process.work
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 1250, in cryosparc_compute.engine.newengine.EngineThread.compute_resid_pow
  File "cryosparc_master/cryosparc_compute/engine/newcuda_kernels.py", line 6539, in cryosparc_compute.engine.newcuda_kernels.compute_resid_pow
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 416, in cryosparc_compute.engine.cuda_core.context_dependent_memoize.wrapper
  File "cryosparc_master/cryosparc_compute/engine/newcuda_kernels.py", line 6469, in cryosparc_compute.engine.newcuda_kernels.get_compute_resid_pow_kernel
  File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 290, in __init__
    cubin = compile(source, nvcc, options, keep, no_extern_c,
  File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 254, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir, target)
  File "/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 135, in compile_plain
    raise CompileError("nvcc compilation of %s failed" % cu_file_path,
pycuda.driver.CompileError: nvcc compilation of /tmp/tmp34r97384/kernel.cu failed
[command: nvcc --cubin -arch sm_75 -I/troll/scratch/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/cuda kernel.cu]
[stderr:
kernel.cu(560): error: texture is not a template

kernel.cu(795): error: no instance of overloaded function "tex3D" matches the argument list
            argument types are: (<error-type>, float, float, float)

2 errors detected in the compilation of "kernel.cu".
]

cryoSPARC info:

cryosparc@troll:/home/users/posert$ cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/troll/scratch/cryosparc/cryosparc_master
Current cryoSPARC version: v4.1.2
----------------------------------------------------------------------------

CryoSPARC process status:

app                              RUNNING   pid 3597, uptime 3:02:33
app_api                          RUNNING   pid 3629, uptime 3:02:31
app_api_dev                      STOPPED   Not started
app_legacy                       STOPPED   Not started
app_legacy_dev                   STOPPED   Not started
command_core                     RUNNING   pid 3213, uptime 3:02:46
command_rtp                      RUNNING   pid 3298, uptime 3:02:37
command_vis                      RUNNING   pid 3289, uptime 3:02:39
database                         RUNNING   pid 2922, uptime 3:02:49

----------------------------------------------------------------------------
License is valid
----------------------------------------------------------------------------

global config variables:
export CRYOSPARC_LICENSE_ID="{redacted}"
export CRYOSPARC_MASTER_HOSTNAME="troll"
export CRYOSPARC_DB_PATH="/troll/scratch/cryosparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true

system config:

cryosparc@troll:/home/users/posert$ nvidia-smi
Mon Feb  6 18:45:06 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:05:00.0 Off |                  N/A |
| 23%   39C    P8     2W / 215W |    339MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:06:00.0 Off |                  N/A |
| 20%   36C    P8     2W / 215W |      6MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:09:00.0 Off |                  N/A |
| 20%   35C    P8     6W / 215W |      6MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:0A:00.0 Off |                  N/A |
| 20%   31C    P8    19W / 215W |      6MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2604      G   /usr/lib/xorg/Xorg                327MiB |
|    0   N/A  N/A      2896      G   /usr/bin/gnome-shell                9MiB |
|    1   N/A  N/A      2604      G   /usr/lib/xorg/Xorg                  4MiB |
|    2   N/A  N/A      2604      G   /usr/lib/xorg/Xorg                  4MiB |
|    3   N/A  N/A      2604      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

cryosparc@troll:/home/users/posert$ uname -a
Linux troll 5.15.0-58-generic #64~20.04.1-Ubuntu SMP Fri Jan 6 16:42:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

cryosparc@troll:/home/users/posert$ cryosparcw call nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

What I’ve tried:

  • Installing a system cuda-toolkit
  • Re-installing the cryosparc worker .tar.gz
  • Running cryosparcw install-3dflex

If you haven’t cleared, re-run or deleted this job yet (to ensure information across the various job-related files is consistent), please post:

  • post the "instance_information": section of job.json
  • the commands with which the CUDA toolkit had been installed for this job
  • whether cryosparcw install-3dflex was run between cryosparcw newcuda and the job launch
  • post the "instance_information": section of job.json
$ grep instance_information job.json
"instance_information": {},

the commands with which the CUDA toolkit had been installed for this job

Sorry to be unclear — I installed the toolkit with sudo apt install nvidia-cuda-toolkit, but saw elsewhere that you recommended against a system toolkit, so I’ve since removed it (sudo apt remove nvidia-cuda-toolkit).

This all happened before I tried re-installing the worker, so it shoudn’t have seen the system-wide toolkit. I just mention it as something I tried.

whether cryosparcw install-3dflex was run between cryosparcw newcuda and the job launch

Yes, it was. So the nvcc that cryosparc sees is from that. However, it does seem that cryosparc is using the same one I’ve got at /etc/alternatives/cuda/bin/nvcc, it’s just not in my $PATH:

posert @ troll ~
$ which nvcc

posert @ troll ~
$ su cryosparc
Password:

cryosparc@troll:/home/users/posert$ cryosparcw call which nvcc
/etc/alternatives/cuda/bin/nvcc

Am I assuming correctly that

  • troll is the worker host
  • cryosparc “owns” the CryoSPARC instance’s software and processes?

What is the output of

su - cryosparc # hyphen is important
env

Troll is a worker and the master. We have another worker node which experiences the same error and has been through the same process.

Yes, cryosparc owns all the cryoSPARC stuff, e.g.:

cryosparc@troll:/home/users/posert$ ps aux | grep cryosparc
cryospa+    2423  0.0  0.0  39120 23608 ?        Ss   Feb06   0:15 python /troll/scratch/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /troll/scratch/cryosparc/cryosparc_master/supervisord.conf
cryospa+    2922  1.5  0.9 2727124 1279724 ?     Sl   Feb06  16:49 mongod --auth --dbpath /troll/scratch/cryosparc/cryosparc_database --port 39001 --oplogSize 64 --replSet meteor --nojournal --wiredTigerCacheSizeGB 4 --bind_ip_all
cryospa+    3213  2.5  0.1 1456748 245220 ?      Sl   Feb06  27:06 python -c import cryosparc_command.command_core as serv; serv.start(port=39002)
cryospa+    3289  0.0  0.2 1431556 287640 ?      Sl   Feb06   0:55 python -c import cryosparc_command.command_vis as serv; serv.start(port=39003)
cryospa+    3298  0.9  0.1 923328 248268 ?       Sl   Feb06  10:00 python -c import cryosparc_command.command_rtp as serv; serv.start(port=39005)
cryospa+    3629  0.2  0.0 1040116 121620 ?      Sl   Feb06   2:11 /troll/scratch/cryosparc/cryosparc_master/cryosparc_app/api/nodejs/bin/node ./bundle/main.js

and

cryosparc@troll:/troll/scratch/cryosparc$ ll
total 44
drwxrwxr-x  9 cryosparc BaconguisLab  4096 Nov 17 20:15 ./
drwxr-xr-x 10 root      root          4096 Feb 21  2022 ../
drwxrwxr-x  2 cryosparc cryosparc     4096 Mar 29  2021 cryosparc2_worker/
drwxr-xr-x  2 cryosparc BaconguisLab  4096 Oct  5 10:38 cryosparc-backups/
drwxr-xr-x  3 cryosparc BaconguisLab  4096 Nov 23 13:08 cryosparc_cache/
drwxrwxr-x  4 cryosparc cryosparc    12288 Feb  7 09:31 cryosparc_database/
drwxrwxr-x 16 cryosparc cryosparc     4096 Feb  3 12:38 cryosparc_master/
drwxrwxr-x  9 cryosparc cryosparc     4096 Feb  6 18:26 cryosparc_worker/
drwxrwxr-x  3 cryosparc cryosparc     4096 Jun 24  2021 instance_troll:39001/

What is the output of…

posert @ troll ~
$ su - cryosparc
Password:
cryosparc@troll:~$ env
SHELL=/bin/bash
HISTCONTROL=ignoredups:
PWD=/home/cryosparc
LOGNAME=cryosparc
HOME=/home/cryosparc
LANG=en_US.UTF-8
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
LESSCLOSE=/usr/bin/lesspipe %s %s
TERM=xterm-256color
LESSOPEN=| /usr/bin/lesspipe %s
USER=cryosparc
SHLVL=1
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
PATH=/troll/scratch/cryosparc/cryosparc_master/bin:/troll/scratch/cryosparc/cryosparc_worker/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
MAIL=/var/mail/cryosparc
_=/usr/bin/env

env output looks could, but I may be overlooking something. May I ask what you see when with:

su - cryosparc
cryosparcw call which nvcc
cryosparcw call nvcc --version
cryosparcw call python -c "import pycuda.driver; print(pycuda.driver.get_version())"
posert @ troll ~
$ su - cryosparc
Password:
cryosparc@troll:~$ cryosparcw call which nvcc
/etc/alternatives/cuda/bin/nvcc
cryosparc@troll:~$ cryosparcw call nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
cryosparc@troll:~$ cryosparcw call python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 7, 0)

Interesting. How was CUDA toolkit version 12 installed?
This version may have interfered with
cryosparcw install-3dflex and/or interferes with CryoSPARC operation.
[Added:] Please can you post the output of

cryosparcw call python -c "import torch; print(torch.cuda.is_available())"

and send me the output of

cryosparcw env

by direct message.

Hmm…I don’t remember installing it. Ah, looks like it was automatically installed, probably during cuda installation by apt:

cryosparc@troll:~$ apt list --installed | grep cuda-toolkit

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

cuda-toolkit-12-0-config-common/unknown,now 12.0.146-1 all [installed,automatic]
cuda-toolkit-12-0/unknown,now 12.0.1-1 amd64 [installed,automatic]
cuda-toolkit-12-config-common/unknown,now 12.0.146-1 all [installed,automatic]
cuda-toolkit-config-common/unknown,now 12.0.146-1 all [installed,automatic]

And here’s the first cryosparcw call, I’ll DM the other:

cryosparc@troll:~$ cryosparcw call python -c "import torch; print(torch.cuda.is_available())"
True

CUDA 12 is a pain. Manually defining apt install cuda-11-8 then removing generic cuda will probably solve the problem.

I only upgraded cuda because I was getting driver mismatch errors after updating cryoSPARC, although I guess I could try just updating the drivers and leaving CUDA at 11…

I’ll check what driver I’m running, but I’m pretty sure I’m running 525.85 with CUDA 11.8…

edit:

Sorry, 525.60 with CUDA 11.8

❯ nvidia-smi
Wed Feb  8 13:41:17 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:1A:00.0 Off |                  N/A |
|  0%   37C    P8    19W / 350W |     13MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:68:00.0 Off |                  N/A |
|  0%   46C    P8    35W / 350W |   1470MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      4731      G   /usr/lib/xorg/Xorg                  4MiB |
|    0   N/A  N/A    274226      G   /usr/lib/xorg/Xorg                  4MiB |
|    1   N/A  N/A      4731      G   /usr/lib/xorg/Xorg                530MiB |
|    1   N/A  N/A      5857      G   cinnamon                           16MiB |
|    1   N/A  N/A    274226      G   /usr/lib/xorg/Xorg                194MiB |
+-----------------------------------------------------------------------------+
❯ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Nothing has thrown CUDA errors yet.

I was still on 4XX :sweat_smile:. Maybe I’ll try downgrading the system CUDA today.

Going back down to CUDA 11.8 as @rbs_sci suggested seems to have solved the problem.

Steps to workaround:

  1. sudo apt install cuda-11-8
  2. cryosparcw newcuda /path/to/cuda-11-8
  3. cryosparcw update --override (without this step, install-3dflex fails b/c of inconsistent env)
  4. cryosparcw install-3dflex

I’m now on GPU driver version 525.85.12 with CUDA 12 installed, but cryoSPARC pointed at 11.8.

Thanks for your help everyone!

3 Likes