Cryosparc unable to run any 2D or 3D job

Dear all,

We have recently installed cryosparc v 2.14.2 on a 4 GPU node. However, we are unable to successfully run any job type other than CTF estimation, particle picking and particle extraction.

The other job types (2D and all sorts of 3D jobs) crash before even starting with the following error:

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1685, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 111, in cryosparc2_compute.engine.cuda_core.GPUThread.run
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 991, in cryosparc2_compute.engine.engine.process.work
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 101, in cryosparc2_compute.engine.engine.EngineThread.load_image_data_gpu
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_kernels.py", line 1803, in cryosparc2_compute.engine.cuda_kernels.prepare_real
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 362, in cryosparc2_compute.engine.cuda_core.context_dependent_memoize.wrapper
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_kernels.py", line 1707, in cryosparc2_compute.engine.cuda_kernels.get_util_kernels
  File "/datalocal/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/compiler.py", line 294, in __init__
    self.module = module_from_buffer(cubin)
LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - error   : Binary format for key='0', ident='' is not recognized

Any advice on how to solve the issue would be greatly appreciated!

Thanks a lot in advance!

Best wishes,

Rafa

Maybe a CUDA version issue? what version of CUDA is Csparc seeing?

Hi @Rafael-Ayala,

When users posted this traceback in the past, it was usually due to some sort of hardware issue. See the following post for example, where a user had to modify BIOS settings on their motherboard:

In this post, the user determined the error was being caused by a faulty GPU:

Is it possible if you can confirm that your GPU works normally for other applications?
A few other things you can try:

  1. Uninstall the CUDA Toolkit, reinstall, and reboot.
  2. Uninstall the NVIDIA driver, reinstall, and reboot.
  3. Make sure you have only 1 nvcc version installed and that it is the 64 bit version.

Recently I also encountered this problem. I solved it by modifying the code in compile.py as shown in the last line of the error (mine was in /cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda):

class SourceModule(CudaModule):
    '''
    Creates a Module from a single .cu source object linked against the
    static CUDA runtime.
    '''
    def __init__(self, source, nvcc="nvcc", options=None, keep=False,
            no_extern_c=False, arch=None, code=None, cache_dir=None,
            include_dirs=[]):
        self._check_arch(arch)

        cubin = compile(source, nvcc, options, keep, no_extern_c,
                arch, code, cache_dir, include_dirs)

        from pycuda.driver import module_from_buffer
        self.module = module_from_buffer(cubin)

        self._bind_module()

Upon googling, it seems that the arch parameter would cause error. So I simply removed it:

class SourceModule(CudaModule):
    '''
    Creates a Module from a single .cu source object linked against the
    static CUDA runtime.
    '''
    def __init__(self, source, nvcc="nvcc", options=None, keep=False,
            no_extern_c=False, code=None, cache_dir=None,
            include_dirs=[]):
        #self._check_arch(arch)

        cubin = compile(source, nvcc, options, keep, no_extern_c,
                code, cache_dir, include_dirs)

        from pycuda.driver import module_from_buffer
        self.module = module_from_buffer(cubin)

        self._bind_module()

And 2D class and 3D ab initio can be run again!

2 Likes

I’ve updated to cryosparc 2.4.1+230403. Though cryosparc recognizes the GPUs and can run jobs such as Patch Motion Correct, it returns the following error when attempting 2D Classification:

Traceback (most recent call last):
  File "/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1048, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 192, in cryosparc_compute.engine.engine.EngineThread.setup_current_data_and_ctf
  File "cryosparc_master/cryosparc_compute/engine/cuda_kernels.py", line 1693, in cryosparc_compute.engine.cuda_kernels.compute_ctf
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 414, in cryosparc_compute.engine.cuda_core.context_dependent_memoize.wrapper
  File "cryosparc_master/cryosparc_compute/engine/cuda_kernels.py", line 1679, in cryosparc_compute.engine.cuda_kernels.get_compute_ctf_kernel
  File "/data/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 294, in __init__
    self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - error   : Binary format for key='0', ident='' is not recognized

I’m running RTX A6000 GPUs with CUDA version 11.1.
I’ve tried messing around with the compile.py file, but no success. Any thoughts would be very welcome!

In case anyone runs into a similar issue, updating cuda to version 11.8 and pointing cryosparc_worker to the it using cryosparcw newcuda "path-to-cuda" fixes the problem.

2 Likes

Hi,

I now have the same error.
I have Cryosparc in a master/worker configuration. Two workers running RTX3090 cards are fine, but while trying to integrate an older 4xGTX1080TI system I get this error
[…]
File “[…]/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py”, line 294, in init
self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - error : Binary format for key=‘0’, ident=‘’ is not recognized

It’s the same error for any job including Cuda (Patch Motion, 2D, ab-inito etc.).
I am running Cuda 11.8 with the newest NVIDIA driver 535.86.10.
There is only on NVCC on my system and the BIOS has already included the AVX hacks.
The system has worked previously (I was told) - but we needed to integrate in a different CryoSparc environment as a worker node only, so we decided to reinstall everything. Retrospectively, it was a bad idea…

Is anyone running a 4x GTX1080TI system on an Intel Core i9-7920X Processor
with an ASUS ROG RAMPAGE VI EXTREME Motherboard successfully?

Best
Jan

@wtempel Maybe the issue with the RTX 6000 Ada isn’t an issue with the CUDA version. gebauer is having that same ident error with GTX 1080 tis that we get with the RTX 6000 Ada

After many trials, I could at least get the basic functions running after installing CUDA 10.1 (and also manually installed gcc, g++ v8). So in my case the 1080Ti do not seem to work properly with Cuda 11.x - although as far as I know they should. :person_shrugging: