How to Optimize CryoSPARC for Better Performance with Large Datasets?

heban33651 · October 23, 2024, 6:09am

Hi all,

I have been working with CryoSPARC for some time now, but recently I have started handling much larger datasets than before, and I have noticed a significant drop in performance. Previously, my smaller projects ran smoothly, but now with larger datasets (multiple terabytes of raw data), processing times have increased drastically, and I am running into memory issues more frequently.

I am currently running CryoSPARC on a system with 256 GB of RAM and 4 NVIDIA GPUs (each with 12 GB of VRAM). While this setup has worked well for smaller projects, it doesn’t seem to be scaling efficiently for larger datasets. I’m particularly experiencing issues during the 2D classification and ab-initio reconstruction steps, where things slow down considerably.

I also check this: https://discuss.cryosparc.com/t/cryosparc-update-and-continuing-to-work-in-a-workspac looker But I have not found any solution. Could anyone guide me about this? Does anyone have any tips or suggestions on how I can optimize my setup for better performance with larger datasets? I have already tried adjusting job parameters like the number of parallel jobs and reducing particle size, but these tweaks don’t seem to be making a significant difference. Would upgrading to more GPU memory or adding more CPU cores help? Or are there any specific CryoSPARC settings that I might be overlooking?

Thanks in advance!

Respected community member!

wtempel · October 23, 2024, 5:42pm

Welcome to the forum @heban33651

Do you use particle caching? If you do, please can you post the output of these commands

on the CryoSPARC master:

cryosparcm cli "get_scheduler_targets()"

on the CryoSPARC worker (may or may not be same computer as master):
```
free -h
uname -a
cat /sys/kernel/mm/transparent_hugepage/enabled
```