Error compiling kernel

I am running into the following cuda error when running certain job types on certain GPU nodes.

**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/", line 78, in
  File "cryosparc2_worker/cryosparc2_compute/jobs/template_picker_gpu/", line 62, in
  File "cryosparc2_worker/cryosparc2_compute/jobs/template_picker_gpu/", line 201, in
  File "cryosparc2_worker/cryosparc2_compute/jobs/template_picker_gpu/", line 261, in
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/", line 275, in
  File "cryosparc2_worker/cryosparc2_compute/engine/", line 362, in cryosparc2_compute.engine.cuda_core.context_dependent_memoize.wrapper
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/", line 267, in
  File "cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/", line 294, in __init__
    self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - error   : Binary format for key='0', ident='' is not recognized
========= main process now complete.

This has been discussed several times already on the forum:

It seems like this is a common issue, so I would like to understand the cause more thoroughly.

I am able to reproduce the issue from python:

eval `bin/cryosparcw env`
python -c "import as run;" --project P3 --job J14 --master_hostname --master_command_core_port 39122

Since it seems to be a problem in pycuda, I use the following script (derived from their tutorial) to verify that pycuda works and can execute basic CUDA code:

import pycuda.autoinit
import pycuda.driver as drv
import numpy

print("Cuda Version: {}".format(drv.get_version()))
print("Driver Version: {}".format(drv.get_driver_version()))
print("GPU({}): {} # {}".format(pycuda.autoinit.device.count(),, pycuda.autoinit.device.pci_bus_id()))

from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
    const int i = threadIdx.x;
    dest[i] = a[i] * b[i];

multiply_them = mod.get_function("multiply_them")

a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)

dest = numpy.zeros_like(a)
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1), grid=(1,1))

print(dest - a*b)

This prints the expected cuda toolkit and driver versions, as well as the expected [0...] output. So it appears basic pycuda works.

My next step would normally be to look at the cuda code from cryosparc2_compute/jobs/motioncorrection/, but this is from a compiled extension in cryosparc. Is the source for that available anywhere, or is it considered proprietary? If the latter, could the devs provide any hints about what cuda features might be in use there that are causing the error?

BTW, a secondary bug report: cryosparcw exits with status 0 after this error. Thus it is reported as a “successful” job in our monitoring system.

The errors seem to happen on nodes with Nvidia GTX 1080 or 1080Ti GPUs. Weirdly it was successful on a different node with an RTX 2080Ti. However we have few 2080Ti nodes available, so I’d like to solve it on the older models.

Current cryoSPARC version: v2.13.2
GPUs with errors: Nvidia GTX 1080, Nvidia GTX 1080Ti
GPUs without errors: Nvidia RTX 2080Ti
Cuda Driver: 440.64.00 (all systems)
Cuda Toolkit: 10.0.130 (all systems)

I’ve determined that the failing jobs (e.g. blob picker, template picker, probably others) fail consistently on GTX 1080 and 1080Ti GPUs, and complete successfully on RTX 2080Ti cards. That’s weird, since all machines have the same driver and toolkit versions installed, and it should be compatible with all three cards.

Some more info about the cuda code being executed would be really helpful here, but I’m unsure how to get it.