I’m having trouble with worker SSD Caching and quotas since upgrading to Cyrosparc 5.
Prior to upgrading, all of the SSD cache lived in a single directory (I’m on a shared filesystem) which contained the hostname of the machine that the master was running on.
Since upgrading, it is creating multiple cache directories - each with the hostname of the worker that is using the cache.
Unfortunately now I’m running out of disk space - I have about 12T of quota, but I’m hitting that limit - I’m not sure if it’s because I have too many workers running and they each are trying to use 12T in their own dir, or if it’s because they weren’t cleaning up after themselves/if jobs failed leaving remnants behind:
cryosparc_cache$ du -hd1
656G ./instance_udc-an36-25:60101
0 ./instance_udc-an34-13:60101
656G ./instance_udc-ba05-38:60101
541G ./instance_udc-an33-38:60101
540G ./instance_udc-an28-1:60101
656G ./instance_udc-an40-17:60101
541G ./instance_udc-an25-24:60101
656G ./instance_udc-an38-1:60101
886G ./instance_udc-ba02-38:60101
541G ./instance_udc-ba07-35:60101
0 ./instance_udc-an34-19:60101
656G ./instance_udc-ba04-38:60101
656G ./instance_udc-an34-25:60101
0 ./instance_udc-an37-31:60101
656G ./instance_udc-an36-1:60101
4.1T ./instance_udc-an33-12c0:60101
541G ./instance_udc-an38-29:60101
13T .
I looked in the docs and added the following to cryosparc_worker/config.sh in the hopes that workers would all share the same dir again:
export CRYOSPARC_SSD_PATH=/scratch/npa2pc/cryosparc_cache/instance_udc-an33-12c0:60101
but I’m still seeing multiple directories created for each worker hostname.
This is a typical lane config:
slurm_lane_gpu_a100_config$ cat cluster_info.json
{ “name”: “slurm-gpu-a100”, “worker_bin_path”: “/standard/takcryoem/cryosparc/cryosparc_worker/bin/cryosparcw”, “cache_path”: “/scratch/npa2pc/cryosparc_cache”, “send_cmd_tpl”: “{{ command }}”, “qsub_cmd_tpl”: “/opt/slurm/current/bin/sbatch {{ script_path_abs }}”, “qstat_cmd_tpl”: “/opt/slurm/current/bin/squeue -j {{ cluster_job_id }}”, “qdel_cmd_tpl”: “/opt/slurm/current/bin/scancel {{ cluster_job_id }}”, “qinfo_cmd_tpl”: “/opt/slurm/current/bin/sinfo”}
Help?
Welcome to the forum @cameronf . Have you confirmed that
- There aren’t separate CryoSPARC master installations running on the various
udc-an?? and udc-ba?? nodes?
- That the
/scratch/npa2pc/cryosparc_cache filesystem has significantly faster read performance than the filesystem that hosts CryoSPARC project directories?
Yes - I’ve confirmed there is only one master installation, and I’ve also watched it create new directories under cryosparc_cache every time a new job runs, and it creates them with the instance name of the compute node running the job.
I’ve also confrimed that /scratch is significantly faster (and my IT team specifically asked me to use /scratch)
@cameronf Thanks for these checks. Please can you post the output of the commands
grep -v LICENSE /path/to/cryosparc_master/config.sh
grep cryosparc ~cryosparcuser/.bashrc # replace cryosparcuser with Linux user that owns CryoSPARC processes
Heh - glad I noticed that -v I was like - I don’t think I should post my license in here 
cryosparc_master$ grep -v LICENSE config.sh
# Instance Configuration
# export CRYOSPARC_MASTER_HOSTNAME=“udc-ba38-32c0”
export CRYOSPARC_MASTER_HOSTNAME=$(hostname)
export CRYOSPARC_DB_PATH=“/home/npa2pc/cryosparc/cryodatabase”
# export CRYOSPARC_DB_PATH=“/standard/takcryoem/cryosparc/cryodatabase”
export CRYOSPARC_BASE_PORT=60100
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
export CRYOSPARC_MONGO_CACHE_GB=4
# Security
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
export CRYOSPARC_FORCE_USER=true
# Cluster Integration
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
# Project Configuration
export CRYOSPARC_PROJECT_DIR_PREFIX=‘CS-’
# Development
export CRYOSPARC_DEVELOP=false
# Other
export CRYOSPARC_CLICK_WRAP=true
.bashrc:
export PATH=“/sfs/ceph/standard/takcryoem/cryosparc/cryosparc_master/bin”:$PATH
This definition could play a part in incorrectly defining CRYOSPARC_MASTER_HOSTNAME in terms of a worker’s hostname. Until we implement a guard against the incorrect definition in a future CryoSPARC release, one may work around this problem by defining CRYOSPARC_MASTER_HOSTNAME “statically” (in terms of an actual hostname string instead of hostname command output).
Enabling CRYOSPARC_FORCE_USER increases the risk of inconsistent file ownerships and consequent disruptions.
Interesting - that makes sense I suppose. The problem is my “master” node changes somewhat frequently because of the compute environment I’m stuck in - so the $(hostname) was a way to deal with that - I’ll figure out how to hardcode it going forward.
Thanks!
Hi @cameronf, we made some changes in the latest v5.0.3 update that should no longer require defining CRYOSPARC_MASTER_HOSTNAME statically in cryosparc_master/config.sh. Please let us know if you run into further cache or hostname-related issues after updating!