We run CryoSPARC in a cluster setting.
Our cluster is configured as following -
$ cat cluster_info.json
{
“name”: “dfci-cluster”,
“title”: “dfci-cluster”,
“worker_bin_path”: “/data/CryoSPARC/cryosparc_worker/bin/cryosparcw”,
“send_cmd_tpl”: “{{ command }}”,
“qsub_cmd_tpl”: “sbatch {{ script_path_abs }}”,
“qstat_cmd_tpl”: “squeue -j {{ cluster_job_id }}”,
“qdel_cmd_tpl”: “scancel {{ cluster_job_id }}”,
“qinfo_cmd_tpl”: “sinfo”,
“cache_path”: “/data/CryoSPARC/SSD”,
“cache_quota_mb”: 7000000,
“cache_reserve_mb”: 100000
}
Everything was fine until today I noticed that one of the working nodes run out of space on SSD cache partition. Was really surprised to see this -
$ du -ms /data/CryoSPARC/SSD/
7316775 /data/CryoSPARC/SSD/
As you can see SSD cache uses 300+ GB more that it was quoted to.
It’s an active cluster with multiple jobs from multiple users being run at the same time. But still…
Any suggestions fixing that and making sure it doesn’t happen again really appreciated.