Cryosparc unable to run any 2D or 3D job

Dear all,

We have recently installed cryosparc v 2.14.2 on a 4 GPU node. However, we are unable to successfully run any job type other than CTF estimation, particle picking and particle extraction.

The other job types (2D and all sorts of 3D jobs) crash before even starting with the following error:

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1685, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 111, in cryosparc2_compute.engine.cuda_core.GPUThread.run
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 991, in cryosparc2_compute.engine.engine.process.work
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 101, in cryosparc2_compute.engine.engine.EngineThread.load_image_data_gpu
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_kernels.py", line 1803, in cryosparc2_compute.engine.cuda_kernels.prepare_real
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 362, in cryosparc2_compute.engine.cuda_core.context_dependent_memoize.wrapper
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_kernels.py", line 1707, in cryosparc2_compute.engine.cuda_kernels.get_util_kernels
  File "/datalocal/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/compiler.py", line 294, in __init__
    self.module = module_from_buffer(cubin)
LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - error   : Binary format for key='0', ident='' is not recognized

Any advice on how to solve the issue would be greatly appreciated!

Thanks a lot in advance!

Best wishes,

Rafa

Maybe a CUDA version issue? what version of CUDA is Csparc seeing?

Hi @Rafael-Ayala,

When users posted this traceback in the past, it was usually due to some sort of hardware issue. See the following post for example, where a user had to modify BIOS settings on their motherboard:

In this post, the user determined the error was being caused by a faulty GPU:

Is it possible if you can confirm that your GPU works normally for other applications?
A few other things you can try:

  1. Uninstall the CUDA Toolkit, reinstall, and reboot.
  2. Uninstall the NVIDIA driver, reinstall, and reboot.
  3. Make sure you have only 1 nvcc version installed and that it is the 64 bit version.

Recently I also encountered this problem. I solved it by modifying the code in compile.py as shown in the last line of the error (mine was in /cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda):

class SourceModule(CudaModule):
    '''
    Creates a Module from a single .cu source object linked against the
    static CUDA runtime.
    '''
    def __init__(self, source, nvcc="nvcc", options=None, keep=False,
            no_extern_c=False, arch=None, code=None, cache_dir=None,
            include_dirs=[]):
        self._check_arch(arch)

        cubin = compile(source, nvcc, options, keep, no_extern_c,
                arch, code, cache_dir, include_dirs)

        from pycuda.driver import module_from_buffer
        self.module = module_from_buffer(cubin)

        self._bind_module()

Upon googling, it seems that the arch parameter would cause error. So I simply removed it:

class SourceModule(CudaModule):
    '''
    Creates a Module from a single .cu source object linked against the
    static CUDA runtime.
    '''
    def __init__(self, source, nvcc="nvcc", options=None, keep=False,
            no_extern_c=False, code=None, cache_dir=None,
            include_dirs=[]):
        #self._check_arch(arch)

        cubin = compile(source, nvcc, options, keep, no_extern_c,
                code, cache_dir, include_dirs)

        from pycuda.driver import module_from_buffer
        self.module = module_from_buffer(cubin)

        self._bind_module()

And 2D class and 3D ab initio can be run again!

2 Likes

I’ve updated to cryosparc 2.4.1+230403. Though cryosparc recognizes the GPUs and can run jobs such as Patch Motion Correct, it returns the following error when attempting 2D Classification:

Traceback (most recent call last):
  File "/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1048, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 192, in cryosparc_compute.engine.engine.EngineThread.setup_current_data_and_ctf
  File "cryosparc_master/cryosparc_compute/engine/cuda_kernels.py", line 1693, in cryosparc_compute.engine.cuda_kernels.compute_ctf
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 414, in cryosparc_compute.engine.cuda_core.context_dependent_memoize.wrapper
  File "cryosparc_master/cryosparc_compute/engine/cuda_kernels.py", line 1679, in cryosparc_compute.engine.cuda_kernels.get_compute_ctf_kernel
  File "/data/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/compiler.py", line 294, in __init__
    self.module = module_from_buffer(cubin)
pycuda._driver.LogicError: cuModuleLoadDataEx failed: device kernel image is invalid - error   : Binary format for key='0', ident='' is not recognized

I’m running RTX A6000 GPUs with CUDA version 11.1.
I’ve tried messing around with the compile.py file, but no success. Any thoughts would be very welcome!

In case anyone runs into a similar issue, updating cuda to version 11.8 and pointing cryosparc_worker to the it using cryosparcw newcuda "path-to-cuda" fixes the problem.

2 Likes