This ended up being a lot longer and a little more scattered than I originally intended…
I don’t like CLCs, but I don’t like seeing CPUs hitting 100 degrees either, and all modern CPUs can really pump out heat (some of Intel’s latest server CPUs have 500W TDPs!)
Whether you need DDR4 or DDR5 will depend on the CPU purchased. 5000 series Threadrippers are still DDR4. Both 7000 series and 9000 series are DDR5.
If refurbished systems are an option, you can get a lot of power (or a lot of storage) for a lot less money that way. Depends on the rules of the university/institute.
It’ll largely depend on use case, but @Guillaume is absolutely right. If multiple users, you need to over-provision storage or you’ll run into a nasty shock quite quickly. If only it were easier to explain when someone is complaining about the cost that spending $$$ now means not spending $$$$ later in a panic/rush. But if wishes were fishes we’d all cast nets, sooo…
Run scratch drives in RAID0. As they’re just cache, you want speed, and if a drive dies data isn’t lost (just cached data) so maxing out read/write is the important factor there. A pair of decent TLC 4TB SATA SSDs in RAID0 do pretty well - I can hit ~650MB/s write (~500MB/s sustained over 2.8TB of writes), and ~1GB/s reading. On a budget that’s reasonable. That’s fine for most purposes unless you desperately want to see PCI-E Gen 5 NVMe speeds…
Avoid QLC drives, write speeds drop precipitously with them after sustained writes and their write cycle-count/life is not so high, but for very high capacity they’re the only game in town unless you drop significant quantities on enterprise grade SSDs.
For data you absolutely want RAID, as @Mark-A-Nakasone says. I know that a lot of places on the internet are pushing “hardware RAID is dead!” now, and I can see some justification for it (ZFS has been solid in my experience) but a RAM-backed dedicated hardware RAID controller will perform better than any software RAID (RAIDZ2/3 is particularly slow to write…). RAID5 (RAIDZ1) is a good balance. Motherboard RAID is not hardware RAID, though, it’s just vendor-locked, board-locked software RAID…! ZFS on the other hand can be removed and reassembled on another system with relatively little pain if all disks are present and you created the array using UUIDs (which you should).
A5000 Ada is good for just about everything that CryoSPARC can do, I think, and is noticeably cheaper than the A6000 Ada. 32GB is (currently) enough and you’ll probably run into PyFFTW limitations with box sizes before hitting the 32GB VRAM limit. Depends on supplier, though; it’s getting next to impossible to find Ampere cards now and a friend of mine reported that Ada are being downplayed to him in favour of Blackwell - which CryoSPARC does not currently support, so is not a workable option in the near term.
Our main supplier likes WDC Gold drives, but they’re painfully expensive here. Nearly triple the price of Toshiba Enterprise grade drives. We’ve got multiple dozens of Toshiba drives (from 12-20 TB) and only one has been bad (on arrival, so immediately got replaced) but a collaborator has had worse luck with them. I just had an 8TB QLC Samsung SSD die - SSDs usually go with no warning whatsoever, so be aware of that.
For our servers we buy from a supplier (I can’t source server chassis as easily as they) but for workstations we order parts and I assemble them myself. Can save a lot doing that if you’re comfortable with it (particularly if your university allows purchases from Amazon and you time it well for deals - mine doesn’t allow Amazon at all…) and you’ll get a much better level of control over what you can opt for. We recently purchased some mid/low-range workstations on a small grant and getting parts saved us about 40% compared to buying pre-assembled/selected and asking them to upgrade the bits we needed more of (RAM, storage).
Don’t do cloud storage of data. It’s an absolute black hole for money and in my opinion there are too many chances for things to go awry.
The other option, honestly, is don’t future proof. It’s going to sound a bit odd, but with how quickly things can advance, you might find in two years it’s actually better to get a new system and relegate the older one to smaller projects. It depends on the lab, how many users, etc, so might not be an option. AMD seems to be seriously on a roll at the minute with Zen, and I remain impressed with the improvements which have come from each generation. The exception to that is nVidia, where their current pricing has me investigating the Zen 5 Epycs for a CPU-compute based system again (when I can get two 64-core, 128-thread AVX-512 capable CPUs for the price of one RTX5090, I am sorely tempted). This strategy will depend on what sort of samples you work on. Small proteins or complexes (and by “small” I mean anything <2MDa) will be very workable on a “consumer” system. CryoSPARC doesn’t work on Blackwell cards yet, but a 4090 or A4500/A5000 Ada, 192GB of RAM (if running AM5) and a 9950X will handle a fair amount of processing, although it might not manage high numbers of particles in 3D flex or 3DVA. Will really depend on your requirements (and budget) at the time, and what future plans you have.