Our lab is planning to purchase a GPU workstation so that we can do our own CryoSPARC processing locally for datasets collected on a Thermo Fisher Tundra microscope. Most of our workflows will involve standard single-particle processing (motion correction, CTF estimation, particle picking, 2D/3D classification, and refinements).
I’m trying to figure out what hardware configuration would be the most cost-effective while still allowing us to process datasets reasonably quickly.
A few questions for people who have built similar systems:
1. GPU configuration
How many GPUs do you find most practical for a single workstation (2, 4, or more)?
Which GPUs have worked best for you in terms of performance vs cost (e.g., RTX 4090, 3090, A5000, A6000, etc.)?
Is it still sufficient to target ~24 GB VRAM GPUs, or are people moving toward larger VRAM options?
2. CPU / RAM balance
What CPU configuration do you recommend relative to GPU count?
I’ve seen suggestions like ~4 CPU cores per GPU and 64–128 GB RAM per GPU — does that still hold up for current CryoSPARC workflows?
For a 2–4 GPU workstation, would 256–512 GB RAM be a reasonable target?
3. Ifyoupurchased from a vendor, I’d love to know:
which companies you’ve had good experiences with
approximate price ranges for your configuration
Thanks in advance. I really appreciate any advice from people who have already gone through this process!
If you want to use CryoSPARC Live to its full potential, you need at least two GPUs: one for pre-processing and picking, one for 2D classification and 3D refinement. With a single GPU, you have to pause pre-processing to free up the GPU for the subsequent steps, so this cancels all the benefits of Live (unattended processing on-the-fly).
With two 3080s or 3090s, Live can easily keep up with the rate of data collection, so you get near real-time feedback. File transfer is often the limiting factor, depending on your specific network and storage setup.
The rest mostly depends on your specific combination of requirements and constraints (mostly budget, with the crazy increase in prices these days…). For example, if you would like several people to submit jobs at the same time and have them processed reasonably quickly, obviously more GPUs and larger storage will always help. But in such a situation, you will hit the limits of a single workstation pretty quickly (difficult to pack more than 4 GPUs in there), and will be better off buying GPU nodes to add to your local cluster (if you have one). This is a very different scenario than having a workstation dedicated to simple on-the-fly processing to monitor ongoing data collection and moving the results when collection is over, which has more modest requirements in both GPUs and storage space.
So it is a bit difficult to give more specific advice. In my view, two pieces of advice are universally applicable: 1) don’t be cheap with storage, it will eventually fill up! and being limited by storage will cause the compute resources to be under-used; 2) if you have a cluster, adding GPU nodes to it is almost always an overall better option than a workstation (think about noise and heat in the space that will host the workstation: you don’t want this near people, it will drive them crazy).
1.1) For me, a way to think about this is to guess which samples and corresponding box sizes you expect. You can always wait a bit longer, but if the job constantly crashes because you are running out of VRAM thats really annoying. If you are mainly working with small proteins, “cheaper” consumer GPUs with less VRAM (e.g. 16 or 24 Gb) will do the job. We still have a couple of old GPUs (RTX2080Ti or RTX5000, both Turing), and for my projects, they always worked (box sizes 384 or smaller). If you are planning to process big complexes, this will be a very different story.
1.2) The more GPUs, the better, but as Guillaume mentioned its hard to fit more than 4 GPUs, and if you go with something like an RTX4090/5090, it will be even harder to fit more than 2 in a normal computer case (cooling: 500-600W TDP, space: 3x or 3.5x slot design), unless water cooled. The advantage of the Quadro GPUs is that you can get them in single (e.g A4000 Ada/Blackwell) or double slot design (e.g., A5000) and fit more of them. But I think at the end it’s also a price consideration with the crazy prices right now. We have some A4000 Blackwell, they are honestly more than fast enough for most processing we do (usually 256-384 box), single slot, and only 145W TDP.
3)This is essentially the same story as 2), it depends a lot on the box sizes you deal with. In our newer workstations, we have 128 GB RAM per GPU / 6-12 CPU cores, and in the older workstations, we have 64GB per GPU. In both cases, it never causes any issues and is also enough for RELION, but again, we usually do not have such large box sizes. Interestingly, cryoSPARC is still not really able to utilize many cores in most jobs, and this high single-thread performance is still super important.
Bottom-line a lot of depends on your needs, but good luck finding a nice workstation and hopefully for a reasonable price.
Thirding the storage issue, but adding that (if you don’t already have your Tundra) data sets are gigantic size-wise compared to the larger scopes, particularly if you have the Ceta-F. Ceta-F saves images only as .mrc (no .tiff compression), so the size is ~1000 movies = 1 TB. We did not expect this when we got our Tundra. My understanding with the Falcon C is that data set sizes tend to be similar because of the 2k x 2k camera requiring more images for the same ultimate field of view. We are very happy with our Tundra / Ceta-F combo but biggest complaint is we are always juggling storage issues.
Agreed with most of the others. I would say ideally a single workstation has 4 GPU. We (cluster configuration) have both 1080Ti and A4500 GPUs, and the 1080Tis are slower but do fine. I would prioritize more GPU over bigger GPU because of (cost and) what others are saying – if a GPU is reserved for a particular job, it can’t be used for any other job. Usually it is more frustrating for my users to be waiting for a GPU to run a job (or not being able to fully utilize Live) than for jobs to be running more slowly. This is particularly true if you are planning on utilizing your Tundra to take multiple small datasets day-of and the user would like Live processing to be giving real-time feedback so they can make decisions on the next grid.
CryoSPARC is generally GPU-heavy (vs. CPU). Our 8 GPU cluster computers have 500-700 GB RAM (48-64 processors) and we never, ever are CPU limited with CryoSPARC jobs. I think the numbers you posted above are reasonable.
Come join us on the (mostly inactive, but friendly!) Tundra mailing list, if you aren’t there already: Send a blank email message to tundra_cryoEM%subscribe%request@listserv&med&harvard&edu but replace the % with - and & with .
(if that doesn’t work and you want to be subscribed, send me a message here and I can add you! the listserv interface is … difficult ).
We have built a number of Live machines over the years. Cost is always a concern for us, so we have tried to get away with lower end GPUs when possible. Our most recent builds have 6x RTX 4000 Ada 20GB, paired with an AMD 9354 (32c, 3.25GHz, 280W), 512GB RAM and 16T of NVME for cache. Likely the RAM and cache is a bit of an overkill. We find these specs have enough punch to keep up with 550 mics per hour from the K3, with 3 Live workers, 1x 2D, 1x 3D, and a spare for side jobs.
To note, these systems are 2U rack mounts (not workstations). The GPUs are single slot and 130W each, so total power draw is quite low.
I am based in Australia so prices and vendors might not help, but happy to share via personal message.
I’ve been testing a range of Tundra data recently and recommendations will really depend on whether you’re using a Ceta-F or Falcon-C. I’ll admit I’m really impressed with the Falcon-C. I’m less impressed with the Tundra loading mechanism, but that’s a completely different topic. Size of Falcon-C data… from a collection of datasets I’ve got on hand, comprising of ~400-450 micrographs, they range in size from 28GB to 190GB, so it depends what magnification and total dose you’ll be using. And what the strategy will be for data acquisition - simple, quick screening or longer acquisitions.
For Falcon-C data, crunching on 2x 5060Tis (16GB) worked reasonably well, so they could be a budget option given current pricing. But they are dual slot. Otherwise 2-4x Blackwell A4000s (24GB now) would do pretty well I think for most scenarios/data off a Tundra. Or Ada if you can find at a discount. I’d recommend 256GB of RAM minimum, but 512GB would be better if the system will be used for more than routine Live workflows (3DVA/3D flex).
Storage will depend on whether it’s a dedicated Live system, and projects are detached and cycled off to other places for continued processing, or whether it is going to be a workhorse where multiple projects live longer term. If the former, 3-4 16-20TB HDDs in RAIDZ1 with 2x4TB NVMe RAID0 for scratch will be more than enough, if the latter… throw high capacity HDDs at it until the budget runs out.
It’s really important to define budget and work to build a desired specification from there (difficult right now due to price volatility, I know!) because building around the idea of a $10K budget, then finding out you’ve only got $5-7K results in headaches deciding what gets cut first (and when one choice impacts another - e.g. CPU change means motherboard change means 4 GPUs are no longer possible…)