2D Classification PyCuda Error (CUDA 11)

vitorserrao · June 13, 2020, 1:09am

Hi there,

I was performing a 2D class from some small datasets using the newest version (v2.15) and I’m getting this error below. This issue started when I updated the Cuda version (11.0) even updating the newcuda path. I had no problem using 2x 2080Ti NVIDIA cards in the prior steps before 2D-class.

Thank you guys!

[CPU: 1.27 GB]   Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1685, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc2_master/cryosparc2_compute/engine/cuda_core.py", line 128, in cryosparc2_compute.engine.cuda_core.GPUThread.run (/fast5/userhome/nfrasser/cryosparc2/cryosparc2_package/cryosparc2_master/cryosparc2_compute/engine/cuda_core.c:5079)
  File "cryosparc2_master/cryosparc2_compute/engine/cuda_core.py", line 129, in cryosparc2_compute.engine.cuda_core.GPUThread.run (/fast5/userhome/nfrasser/cryosparc2/cryosparc2_package/cryosparc2_master/cryosparc2_compute/engine/cuda_core.c:5030)
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 1011, in cryosparc2_compute.engine.engine.process.work
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 175, in cryosparc2_compute.engine.engine.EngineThread.setup_current_data_and_ctf
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_kernels.py", line 1732, in cryosparc2_compute.engine.cuda_kernels.extract_fourier_2D
  File "/home/leelab/CS2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/driver.py", line 382, in function_call
    func._set_block_shape(*block)
LogicError: cuFuncSetBlockShape failed: invalid resource handle

stephan · June 16, 2020, 3:44pm

Hi @vitorserrao,

At this point, cryoSPARC isn’t compatible with CUDA 11.0- we’ll keep this thread updated on this. For the time being, it might be best to have CUDA 10.2 installed on your system, and use cryoSPARC with this version of CUDA.

vitorserrao · June 16, 2020, 4:28pm

Hi Stephan,

Thank you for your reply. Awesome, I’ll use the 10.2 and hopefully it will work. I’ll also really appreciate if CUDA11 became useful for the next version that you guys are probably working on it.

Thank you again!

Vitor

stephan · June 17, 2020, 2:17pm

Hi @vitorserrao,

It will! Also we’re definitely planning on supporting CUDA 11 asap- just waiting on some core dependencies we rely on to update to also support CUDA 11, and we should be good to go!

hsui · August 3, 2020, 11:46pm

Found it! It looks like I have just encountered the same error message ending with “…LogicError: cuFuncSetBlockShape failed: invalid resource handle”. Looking forward to an update that works with CUDA11.0. Thank you guys!

Hug · October 5, 2020, 10:39am

Hello Stephan,

May I ask if you have a provisional date for the compatibility with CUDA 11?

I’m hesitating between downgrading to CUDA 10 on one of our workstations, or waiting until the next Cryosparc version is released. An approximate date of release would help choosing

Thanks!

stephan · October 19, 2020, 2:24pm

Hi @Hug,

As of now, we still don’t have an exact date for CUDA 11 support, although we do have it planned before the end of the year. I think your best bet would be to downgrade to CUDA 10 given you’re not using one of the new Ampere GPUs. We also have an update to cryoSPARC coming out soon that we’ve been working on for a while now, which uses the current dependencies. I hope this helps.

zjr · December 8, 2020, 10:08pm

Hi @stephan

Our new workstation equipped with “Geforce RTX 3090”, and I have the same error as vitorserrao when running 3d-AbInitial with extracted particles from Relion3.1. Later I reinstall cryosparc with cuda 10.1, rerun the same 3d-AbInitial job, but get error as follows reported. Do you think this error still connect to cuda version? Or there’s something else? Thanks for your time!

Jinru

[CPU: 2.05 GB] Traceback (most recent call last):
File “cryosparc2_worker/cryosparc2_compute/run.py”, line 85, in cryosparc2_compute.run.main
File “cryosparc2_worker/cryosparc2_compute/jobs/abinit/run.py”, line 161, in cryosparc2_compute.jobs.abinit.run.run_homo_abinit
File “cryosparc2_worker/cryosparc2_compute/jobs/abinit/run.py”, line 498, in cryosparc2_compute.jobs.abinit.run.generate_initial_density_from_projections
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 927, in cryosparc2_compute.engine.engine.process
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 937, in cryosparc2_compute.engine.engine.process
File “cryosparc2_master/cryosparc2_compute/engine/cuda_core.py”, line 153, in cryosparc2_compute.engine.cuda_core.allocate_gpu
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/gpuarray.py”, line 549, in fill
func = elementwise.get_fill_kernel(self.dtype)
File “”, line 2, in get_fill_kernel
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/tools.py”, line 432, in context_dependent_memoize
result = func(*args)
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 496, in get_fill_kernel
“fill”)
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 161, in get_elwise_kernel
arguments, operation, name, keep, options, **kwargs)
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 147, in get_elwise_kernel_and_types
keep, options, **kwargs)
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 75, in get_elwise_module
options=options, keep=keep)
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/compiler.py”, line 291, in init
arch, code, cache_dir, include_dirs)
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/compiler.py”, line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/compiler.py”, line 78, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode(“utf-8”))
File “/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/compiler.py”, line 55, in preprocess_source
cmdline, stderr=stderr)
CompileError: nvcc preprocessing of /tmp/tmpmtnZSU.cu failed
[command: nvcc --preprocess -arch sm_86 -I/home/exx/soft/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/cuda /tmp/tmpmtnZSU.cu --compiler-options -P]
[stderr:
nvcc fatal : Value ‘sm_86’ is not defined for option ‘gpu-architecture’
]

wilnart · December 9, 2020, 4:35am

Hi zjr,

The ampere graphic cards ( RTX 3000 series) are supported by cuda-11.1 upwards. You can’t run cryosparc in its current form with those cards (no cuda-11 support yet). So you either use the previous generation GPU or wait for future cryosparc release which hopefully will have cuda-11 support. Thanks

spunjani · December 9, 2020, 3:43pm

Hi @zjr, v3.0 released today offers CUDA 11 support. Thanks!

leomjeh · December 11, 2020, 9:41pm

Hi,

I am having the same problems. RTX 3090&Cuda 11.0@cryosparc 3.0.0

The installation went well. But failed during 2D classification:

[CPU: 1.47 GB]   Traceback (most recent call last):
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/tools.py", line 429, in context_dependent_memoize
    return ctx_dict[cur_ctx][args]
KeyError: <pycuda._driver.Context object at 0x7f3d0b12e930>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 84, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/abinit/run.py", line 163, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
  File "cryosparc_worker/cryosparc_compute/jobs/abinit/run.py", line 508, in cryosparc_compute.jobs.abinit.run.generate_initial_density_from_projections
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 933, in cryosparc_compute.engine.engine.process
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 943, in cryosparc_compute.engine.engine.process
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 154, in cryosparc_compute.engine.cuda_core.allocate_gpu
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 549, in fill
    func = elementwise.get_fill_kernel(self.dtype)
  File "<decorator-gen-13>", line 2, in get_fill_kernel
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/tools.py", line 433, in context_dependent_memoize
    result = func(*args)
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 498, in get_fill_kernel
    "fill")
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 163, in get_elwise_kernel
    arguments, operation, name, keep, options, **kwargs)
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 149, in get_elwise_kernel_and_types
    keep, options, **kwargs)
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/elementwise.py", line 76, in get_elwise_module
    options=options, keep=keep, no_extern_c=True)
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 291, in __init__
    arch, code, cache_dir, include_dirs)
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 254, in compile
    return compile_plain(source, options, keep, nvcc, cache_dir, target)
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 78, in compile_plain
    checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
  File "/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/compiler.py", line 55, in preprocess_source
    cmdline, stderr=stderr)
pycuda.driver.CompileError: nvcc preprocessing of /tmp/tmpso45omio.cu failed
[command: nvcc --preprocess -arch sm_86 -I/home/mj/app/cryosparc300-20201211/install/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/cuda /tmp/tmpso45omio.cu --compiler-options -P]
[stderr:
b"nvcc fatal   : Value 'sm_86' is not defined for option 'gpu-architecture'\n"]

Thanks,

MJ

leomjeh · December 12, 2020, 9:06pm

Hi,

I just found the problem.
For my RTX 3090 workstation with cryosparc 3.0. Cuda-11.1 or upwards is needed, Cuda 11.0 did not work.

Many thanks to Wilnart!

MJ

CleoShen · April 23, 2021, 4:12am

Hi spunjani:

I updated to 3.2, and when running 2d classification has the same error: pycuda._driver.LogicError: cuDeviceGet failed: invalid device ordinal. I tried to load cuda10 and cuda11, neither works. Is there any way to solve this error?

Best,
Chuchu

spunjani · April 26, 2021, 2:45pm

@CleoShen Can you please post the output of nvidia-smi?

It’s also worth re-starting your instance as that may resolve the error.

CleoShen · April 26, 2021, 6:35pm

Hi spunjani:
Thank you for replying. I tried re-starting, didn’t work. Here is the gpu information:

[sh02-15n06 ~]$ nvidia-smi
Mon Apr 26 11:33:00 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN Xp            On   | 00000000:02:00.0 Off |                  N/A |
| 23%   25C    P8     8W / 300W |      2MiB / 12196MiB |      0%   E. Process |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

CleoShen · April 26, 2021, 6:57pm

Hi spunjani,

Following with las reply, I tested the 2D classification job on T20S tutorial data, it worked well.

stephan · April 26, 2021, 8:43pm

Hi @CleoShen,

It looks like your GPU is in “Exclusive Process” mode. This mode only allows one process to be running on the GPU at a time, which is most likely why you got a pycuda._driver.LogicError: cuDeviceGet failed: invalid device ordinal error.

0 – Default - Shared mode available for multiple processes
1 – Exclusive Thread - Only one COMPUTE thread is allowed to run on the GPU (v260 exclusive)
2 – Prohibited - No COMPUTE contexts are allowed to run on the GPU
3 – Exclusive Process - Only one COMPUTE process is allowed to run on the GPU

To change the compute mode for your GPUs, run the command:
sudo nvidia-smi -c <compute mode #>
e.g. sudo nvidia-smi -c 0

CleoShen · April 26, 2021, 8:59pm

Hi Stepan,

Thank you for your reply. As I’m running job on a computer cluster, I have no authority to do sudo changes, is there any other solutions? Secondly, can I know the reason why the “Exclusive Process” mode can run 2D classification job of the tutorial data, but not my own data?

Best,
Chuchu

stephan · April 26, 2021, 9:34pm

HI @CleoShen,

When a GPU is in exclusive process mode, only one process can run at a time. It’s possible that another user or application was using the GPU while you were trying to run the 2D Classification job on your own dataset. You should try it out now while there is no one else using the GPU; it should work. Keeping the GPU in exclusive process mode ensures that the running process doesn’t die if another process comes along and hogs all the GPU memory (for example). If you’re on a shared cluster, this might be the best way to ensure fairness.

CleoShen · April 26, 2021, 9:50pm

Hi Stephan, I’m pretty sure that there was no other job running on that node I set for the 2D classification job, but it still reported the pyCuda logical error. If you have any other thoughts about this error later, please let me know.