Future proofed mid-range workstation for SPA, ET and ED

Hi all,

I’m considering writing a grant to replace one of our aging workstations with something mid-range but future proofed that I could upgrade in stages. My initial idea is:

1x A6000 Ada
Threadripper 5975WX (possibly 7975WX)
2x 32GB DDR5
2 TB M.2 NVMe (possibly 4 TB)
~20 TB HDD
~1200W PSU
Noctua CPU cooler

If future grants are approved then I can upgrade in stages eg 1x A6000 + 32 GB RAM followed by additional A6000s + RAM if required (probably a second PSU will be required).

Does this sound reasonable? Any recommendations for motherboards and workstation focused cases that support dual PSUs?

Cheers

Not much time so a few quick thoughts;

64GB of RAM is going to hamstring you quite severely. 128GB is a realistic minimum, even with one GPU. 256GB would be better as some functions in CryoSPARC are quite RAM hungry.

A 1600W PSU (e.g. Corsair AX1600i) will happily manage a 5995WX, 512GB ECC-RDIMMs, 6xA4000 (Ampere) and 7x NVMe drives (personal experience). Or 2x A6000 + 2xA4000 (Ampere). So 2 or 3 A6000s should be fine on a 1600W PSU.

If going with Threadripper (Pro), think about a closed-loop cooler (as much as I dislike them). 350W+ TDP is not fun to cool by air.

Dual GPUs can be a headache, and it presumes that your have a PSU failure, rather than a power cut. The likelihood of either depends on where you are, of course…

Recommendations for motherboard and case will depend on whether you’re building your own (or ordering custom) or buying a pre-configured workstation.

Very useful thanks!

I’ll budget to fill the DIMM slots with 32 GB sticks.

I was hoping to avoid a CLC but if it’s needed then that’s fine. Could go with another CPU though (eg EPYC) but threadripper pro seems cost effective.

I was thinking of dual PSUs to give headroom for more GPUs further down the line eg 2x A6000 + everything else on one PSU and rest of the GPUs on the other PSU. But perhaps a single 1600W titanium PSU is sufficient - that opens up the case choice too.

Likely I’ll be building it myself but will look into the custom route to see if it’s cost effective with the contract tendering process we have to use.

I think 20 TB of storage is far from sufficient. Unless this workstation will be connected to some larger network storage. But then I’d be concerned about bandwidth and latency.

You can’t go wrong with more storage, since it’s a resource that will eventually be used, no matter how much you have. Neglecting it will severely limit what you can do (you really don’t want to have this workstation spend a fraction of its time just moving data in and out just because you don’t have enough storage to keep several datasets at the same time) or force you to upgrade sooner than planned.

2 Likes

what everyone else said I agree with.

256 GB of DDR4 RAM would be realistic, DDR5 is still costly.

Western Digital Datacenter (WDC) are stable for 20TB, but don’t you want a RAID5 array that will have at least 3 drives ? If not, you best have a good central storage you can mount (NFS ideal) to read the raw data from. There are ways to do this in the cloud, but expensive.

Most motherboards will support multiple NVMe or M.2 drives. I usually run a 2TB for the file system or mirror it (2x 2TB) or have 2TB SSD with 2x 4TB in RAID0/1. Also consider a PCIe card that takes NVMe or M.2 e.g. https://www.highpoint-tech.com/product-page/rocket-1508

Have done thread rippers, but also dual intel Xeons - AMD is cheaper.

The A5000 and A5000 ada are probably fine for cryosparc and will keep costs down.

Always look at the turn key builds to get an idea, but OEM builds cost much less

https://www.exxactcorp.com/category/AMD-Threadripper-Solutions

https://www.singleparticle.com/cryo-em-workstation

if you can solve your storage that would be different than compute. We found really quickly 12/16TB in RAID5 with 50-80TB arrays filled up fast. Therefore, we got a NAS system in a rack which was reasonable for a few PB. But you can go smaller. Getting the system and adding drives latter in vDEVs could be an option.

Overall your system is limited by RAM and storage. Some of my CryoSparc projects excluding the raw data are 8-10TB.

2 Likes

This ended up being a lot longer and a little more scattered than I originally intended…

I don’t like CLCs, but I don’t like seeing CPUs hitting 100 degrees either, and all modern CPUs can really pump out heat (some of Intel’s latest server CPUs have 500W TDPs!)

Whether you need DDR4 or DDR5 will depend on the CPU purchased. 5000 series Threadrippers are still DDR4. Both 7000 series and 9000 series are DDR5.

If refurbished systems are an option, you can get a lot of power (or a lot of storage) for a lot less money that way. Depends on the rules of the university/institute.

It’ll largely depend on use case, but @Guillaume is absolutely right. If multiple users, you need to over-provision storage or you’ll run into a nasty shock quite quickly. If only it were easier to explain when someone is complaining about the cost that spending $$$ now means not spending $$$$ later in a panic/rush. But if wishes were fishes we’d all cast nets, sooo…

Run scratch drives in RAID0. As they’re just cache, you want speed, and if a drive dies data isn’t lost (just cached data) so maxing out read/write is the important factor there. A pair of decent TLC 4TB SATA SSDs in RAID0 do pretty well - I can hit ~650MB/s write (~500MB/s sustained over 2.8TB of writes), and ~1GB/s reading. On a budget that’s reasonable. That’s fine for most purposes unless you desperately want to see PCI-E Gen 5 NVMe speeds… :wink: Avoid QLC drives, write speeds drop precipitously with them after sustained writes and their write cycle-count/life is not so high, but for very high capacity they’re the only game in town unless you drop significant quantities on enterprise grade SSDs.

For data you absolutely want RAID, as @Mark-A-Nakasone says. I know that a lot of places on the internet are pushing “hardware RAID is dead!” now, and I can see some justification for it (ZFS has been solid in my experience) but a RAM-backed dedicated hardware RAID controller will perform better than any software RAID (RAIDZ2/3 is particularly slow to write…). RAID5 (RAIDZ1) is a good balance. Motherboard RAID is not hardware RAID, though, it’s just vendor-locked, board-locked software RAID…! ZFS on the other hand can be removed and reassembled on another system with relatively little pain if all disks are present and you created the array using UUIDs (which you should).

A5000 Ada is good for just about everything that CryoSPARC can do, I think, and is noticeably cheaper than the A6000 Ada. 32GB is (currently) enough and you’ll probably run into PyFFTW limitations with box sizes before hitting the 32GB VRAM limit. Depends on supplier, though; it’s getting next to impossible to find Ampere cards now and a friend of mine reported that Ada are being downplayed to him in favour of Blackwell - which CryoSPARC does not currently support, so is not a workable option in the near term.

Our main supplier likes WDC Gold drives, but they’re painfully expensive here. Nearly triple the price of Toshiba Enterprise grade drives. We’ve got multiple dozens of Toshiba drives (from 12-20 TB) and only one has been bad (on arrival, so immediately got replaced) but a collaborator has had worse luck with them. I just had an 8TB QLC Samsung SSD die - SSDs usually go with no warning whatsoever, so be aware of that.

For our servers we buy from a supplier (I can’t source server chassis as easily as they) but for workstations we order parts and I assemble them myself. Can save a lot doing that if you’re comfortable with it (particularly if your university allows purchases from Amazon and you time it well for deals - mine doesn’t allow Amazon at all…) and you’ll get a much better level of control over what you can opt for. We recently purchased some mid/low-range workstations on a small grant and getting parts saved us about 40% compared to buying pre-assembled/selected and asking them to upgrade the bits we needed more of (RAM, storage).

Don’t do cloud storage of data. It’s an absolute black hole for money and in my opinion there are too many chances for things to go awry.

The other option, honestly, is don’t future proof. It’s going to sound a bit odd, but with how quickly things can advance, you might find in two years it’s actually better to get a new system and relegate the older one to smaller projects. It depends on the lab, how many users, etc, so might not be an option. AMD seems to be seriously on a roll at the minute with Zen, and I remain impressed with the improvements which have come from each generation. The exception to that is nVidia, where their current pricing has me investigating the Zen 5 Epycs for a CPU-compute based system again (when I can get two 64-core, 128-thread AVX-512 capable CPUs for the price of one RTX5090, I am sorely tempted). This strategy will depend on what sort of samples you work on. Small proteins or complexes (and by “small” I mean anything <2MDa) will be very workable on a “consumer” system. CryoSPARC doesn’t work on Blackwell cards yet, but a 4090 or A4500/A5000 Ada, 192GB of RAM (if running AM5) and a 9950X will handle a fair amount of processing, although it might not manage high numbers of particles in 3D flex or 3DVA. Will really depend on your requirements (and budget) at the time, and what future plans you have.

2 Likes