cryoSPARC benefits from more GPU Vram or processing power?

Dear cryoSPARC community,

We are looking to procure a new workstation. Since the GPU prices are so high, we are looking to trade off either one or the other spec. Our current workstation has the following

  • 4x RTX 4070Ti 12Gb
  • AMD Ryzen Threadripper Pro 5975WX 32 cores
  • 512gb DDR4 3200 MHz RAM

With this workstation, we noticed that for bigger boes (700 pix), the refinement becomes very slow, probably due to a lack of Vram. Therefore, I am thinking of getting an RTX 3090 instead, to increase the Vram capacity to 24gb without breaking the buck.
Therefore, my question is whether cryoSPARC jobs (particularly NU-Refinement) benefit from more GPU Vram or higher processing power?

Thank you and kind regards,
Khoa.

If you have 24GB cards, you should be good. While the Ada Lovelace generation cards are (on paper) significantly faster than their Ampere generation predecessors, in practice due to other factors (disk I/O, mostly) they’re not dramatically faster in most practical cryo-EM image processing scenarios. If you can find 3090’s at reasonable prices (they’re EoL now, I believe) then it’s probably a better idea than 12GB or 16GB 4000 series cards. Also power budget; the Ada Lovelace consumer cards are pushed to the edge, and setting some power limits will dramatically reduce their power consuption and heat output without crippling their performance (25-40% reduction in power draw for 5-10% drop in performance if I remember the benchmarks I saw correctly…)

Since Blackwell is supposed to come H1 next year, I would normally counsel waiting, but the rumours I’ve seen about prices of the Blackwell cards are that they’re going to be even more ridiculously expensive than the Ada Lovelace generation. Take that for what it’s worth, given that it is rumour, but it’s a reasonable extrapolation given what nVidia focussed on during GTC and their investor calls (improvements in AI inference speed, etc).

A few semi-related thoughts:

  • PyFFTW imposes a limit on box size in CryoSPARC regardless of GPU VRAM (1120 pixel boxes are the largest I’ve run successfully, even on 48GB GPUs, while 1480 is possible in RELION on 48GB GPUs with Fourier padding disabled)
  • Even on 48GB GPUs, 700 pixel boxes crash in NU refine (for me) unless “Low memory mode” is enabled
  • 12GB/16GB cards are attractive, but many of the more demanding features of modern CryoSPARC will max out and crash (3D flex, high class count and/or high resolution 3D classification, 3D variability analysis, larger box size Local Resolution Estimation) on <24GB cards
2 Likes

Hi,

We’ve accumulated quite few 4090 cards in our cluster. Do you happen to have a reference for this that I could pass on to our HPC team?

Cheers,
Yang

edit: Sorry, didn’t realise Ctrl+Enter just immediately posted.

These are game focussed, but I wouldn’t be surprised if it is actually more efficient outside of gaming scenarios. I saw some others which were more HPC focussed, but I’ll need to check my bookmarks at home.

I’ve avoided 4090’s because of their insane power draw. 450W (600W in some cases for the CLC cooled cards) is absolutely ridiculous.

edit 2: The fact that increasing the power limit 20% but only yields <2% gains shows just how close to the limit the Ada Lovelace cards are.

1 Like

Thank you very much. Would appreciate any HPC-related benchmarks as well if you can find them. I’m surprised our racks/PSUs haven’t melted already.

Cheers,
Yang

It’s very possible your HPC team have already power limited them.

The Quadros are much saner than their consumer focussed cards. 300W power limit on the A6000 Ada, for example, like it’s older sibling from Ampere gen.

For me, here, the Ada cards are 60-80% more than their Ampere equivalents, so my reticence was both price and power demand. But our suppliers now tell me Ampere is EoL and getting more is next to impossible. I’m not looking forward to the quote for an octa-Ada GPU box, it’ll be practically double what the equivalent Ampere was when Ada launched. :expressionless:

It’s actually bad enough that a CPU path for CryoSPARC would be extremely appealing, simply because throwing a few 128-core Zen 5 Epyc CPUs at the problem will actually be cheaper than a high-GPU-count Ada or Blackwell based system.

I’ve been tinkering with AMD GPUs, but ROCm still suffers from poor support in all directions - I have neither the time nor the patience to deal with “supported hardware/OS” requirements which feel like to make headway I need to evoke the spirit of Mussorgsky’s “Night on the Bare Mountain”. :face_exhaling:

1 Like

Indeed, I advocated for the Quadro variants in our last round of procurement, but following discussions with our supplier, it proved prohibitively expensive. I suppose we can pipe some of the heat coming off the 8x4090 nodes into the building’s heating.

Would this be reflected in the pwr cap value reported in nvidia-smi? Or is this not necessarily updated transparently?

Cheers,
Yang

1 Like

Yes, set and checked in nvidia-smi, but I’m not sure if it’s shown correctly in the “normal” output…?

To set:

sudo nvidia-smi -pl 200

Where 200 would be 200W, for example…

And:

nvidia-smi -q -d POWER

Checks whether it’s set or not. Or should, if nVidia aren’t lying to me with out of date docs.

Thank you. That’s helpful. 450W reported. I’ll have a chat with our HPC guy.

Cheers,
Yang

~600-700 pix is the max for me on RTX-3090 of A5000 (all of our workstations are based on 24GB cards) and NVLINK does not help. In general, I have found that Homogeneous reconstruction and Homogeneous Refinement will run with larger box sizes, while NU-refine will not.

3D Flex really only runs at 440 pix, see @hbridges1 post https://discuss.cryosparc.com/t/flex-reconstruction-failing/15016/4?u=mark-a-nakasone.

RELION does things a little differently, but a lot of people see this table https://guide.cryosparc.com/processing-data/tutorials-and-case-studies/performance-metrics and think 1024 pix box size will run fine on a 3090/A5000 - but it will not.

Can use $nvtop to record the logs of GPU core use, VRAM, and power. I find it helpful to watch some jobs through tmux with nvtop and htop.

:wink: some times if the GPUs are not under heavy load you can override the scheduler (run now on a specific GPU 0,1,2,n) in cryosparc and it can save time.

I am grateful to @rbs_sci, the CryoSparc Team, and the other users that post their experiences. I have witnessed many of my users thinking they had the wrong input, something was wrong with the computer, etc.

1 Like

I’m beginning to really dislike that table. I have the same problem; some new users see that, and start throwing accusations around when 1024 pixel boxes don’t work. It’s very, very out of date and really needs to be removed or updated.

Also, NVLINK needs to be explicitly supported in the software AFAIK, and, well, basically nothing bothers.

2 Likes