Reference based motion correction failed - Cuda issue?

Reference based motion correction failed with the following error message:
[CPU: 917.8 MB Avail: 103.73 GB]
Cross-validation scores computed:
[--------------------------------------------------------------------------------] 0/1631 (0%)

[CPU: 234.1 MB Avail: 104.55 GB]
====== Job process terminated abnormally.

[CPU: 222.3 MB Avail: 104.63 GB]
DIE: cuModuleLoadData(image=0x7f5149ea7044): CUDA ERROR: (CUDA_ERROR_INVALID_PTX) a PTX JIT compilation failed

Can you help me on this error ? CUDA version of 12.4 cryosparc v4.4.1 We have no problem running any other program except this one
Thanks for your help Best Yuro

Please can you post the output of the command
nvidia-smi on the worker where the error occurred.

1 Like

Dear @wtempel
Thanks for your reply. I assume you meant the output of nvidia-smi in cryosparc_work directory. If so, here it is:

NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Quadro M6000 24GB              Off |   00000000:81:00.0  On |                  Off |
| 28%   55C    P8             23W /  250W |     773MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|    0   N/A  N/A      8744      G   /usr/libexec/Xorg                             196MiB |
|    0   N/A  N/A      8931      G   /usr/bin/gnome-shell                          268MiB |
|    0   N/A  N/A    288443      G   ...seed-version=20240307-080152.729000        172MiB |
|    0   N/A  N/A    321806      G   ...seed-version=20240307-080152.729000         76MiB |
|    0   N/A  N/A    536189      G   ...UCSF/Chimera64-1.17.3/bin/python2.7         48MiB |

Hi @yurotakagi,

Unfortunately Maxwell-generation nvidia cards (Quadro/Tesla M series, GeForce 900 series) aren’t supported by RBMC at this time. Apologies for the inconvenience.


Dear Harris
Thanks for your reply. It sounds like it is GPU hardware issue. Now, I guess this is something which has been discussed in this discussion board elsewhere. Nonetheless, let me ask you this: what kind of GPU do you recommend to run cryosparc including the program, RBMC in particular?

Thanks for your help. Best. Yuro

Hi @yurotakagi,

It depends on your budget. If you want to upgrade your existing workstation and are happy with just a single GPU, an RTX 4090 would be a great choice. The only problem with 4090s is that they are very physically large, so it is hard to get many of them into one computer.


Dear Harris
Thanks for the information. RTX 4090 seems to be big as you indicated. Now, is there any other GUP, which can work great in cryosparc but not that big or regular size if you like?
Best. Yuro

Hi @yurotakagi,

I would recommend asking the broader community, in a new forum topic. Perhaps title it something like “Recommendations for space efficient GPU”, and include as many details about your setup and objectives as possible.


Depends on budget (both monetarily and power) and target. A4000s (16GB Ampere-gen cards) serve well for a lot of things and are fairly power efficient, but will choke with high-particle-count RBMC and (new codepath) NU Refine on box sizes >400 or so. 24GB cards (3090s, 4090s, A5000s, etc) will do most things, and due to how CryoSPARC does some things, 48GB cards like the A6000 actually provide no benefit for dealing with larger boxes.

An example from some of my recent processing: A5000s (24GB cards) successfully RBMC’d 840-pixel boxes (from 420 pixel refinements, and once I worked around the “duplicate particles” issue, which is another matter entirely) but NU Refine, even on the old codepath (“low memory mode”) would crash on iteration 2 running out of VRAM. Re-RBMC-ing with a 3/4 Fourier crop gave me 630 pixel boxes which actually run fine on the new NU Refine codepath (and really quickly, too!)

I’m extremely wary of the RTX4000 series cards for high sustained compute loads. While I know a number of people using them without issue, a friend had his 4090 melt the 12VHPWR connector (widely reported online, although also widely gaslit as a user fault) and the possibility of having a processing system start a fire makes me avoid them. Combine the ridiculous cooling solutions (triple/quad slot?!) and despite the added cost I’m more interested in less overkill options.

3090’s have kept a fairly high price, so I’d look at A5000s and A5500s.

Hmmm - can you elaborate on this a bit? Why don’t A6000s provide benefit for larger boxes? (Interested because I am considering ordering a 4xA6000 system for both SPA and tomo…)

Sure. Sorry, this is going to be a bit long…

Could have been clearer - in CryoSPARC, no advantage.

In RELION they do - I’ve pushed box sizes up to 1,480 pixels (with no Xorg or other GPU-utilising programs running) and successfully completed 3D refinement converging at 0.02 degrees - however, CryoSPARC has some other limits (due to PyFFTW and other code I think) which means that 1,100 pixel boxes (or very close) are the maximum I’ve had run successfully in CryoSPARC - even 1,122 pixel would crash (don’t ask me how much time I’ve spent experimenting with this… :rofl:)

If I was after STA, A6000s would be a good choice (our tomo box right now is their older sibling, the RTX8000 48GB). If using RELION and boxes <1,450 pixels, A6000s again are a great choice. If you want to go 1,500+ pixels, you need to think about going the CPU codepath in RELION… or the 80GB GPUs if money truly is no object! But from my estimates the 80GB GPUs will top out around 1,600-1,700 pixels anyway.

The largest box I’ve tested and successfully reconstructed - which is going to be great fun to deposit to EMDB - was 1,800 pixels and each half-set MPI process was eating ~270GB system RAM during later steps of the 3D refinement. Memory requirements quickly get out of hand after that.

Since the AVX-512 code in RELION was - if I remember correctly - written by Intel, I need to see if RELION can be compiled to well use the AVX-512 hardware in AMDs new Zen 4 (now) / Zen 5 (next year) or if so much of it comes from the Intel compiler and/or Intel still do their old tricks and deliberately de-optimise for anything not-Intel. And/or how GCC 13/14 handle it. But no time.

As it is, if shopping for a huge-box-system right now, I’d look at a dual-socket Emerald Rapids setup with 2-4TB of RAM and probably quad 24GB GPUs so that you can use GPUs for smaller box situations, then switch to AVX-512 code for the final refinements on fully unbinned particles.

Directly related to these two threads:

Sorry for the potential confusion around my earlier comment. :frowning_face:

1 Like

That is a very helpful and comprehensive explanation, thank you! :grin:

1 Like