Worker SSD Caching on Cryosparc 5

I’m having trouble with worker SSD Caching and quotas since upgrading to Cyrosparc 5.

Prior to upgrading, all of the SSD cache lived in a single directory (I’m on a shared filesystem) which contained the hostname of the machine that the master was running on.

Since upgrading, it is creating multiple cache directories - each with the hostname of the worker that is using the cache.

Unfortunately now I’m running out of disk space - I have about 12T of quota, but I’m hitting that limit - I’m not sure if it’s because I have too many workers running and they each are trying to use 12T in their own dir, or if it’s because they weren’t cleaning up after themselves/if jobs failed leaving remnants behind:

cryosparc_cache$ du -hd1 
656G    ./instance_udc-an36-25:60101 
0       ./instance_udc-an34-13:60101 
656G    ./instance_udc-ba05-38:60101 
541G    ./instance_udc-an33-38:60101 
540G    ./instance_udc-an28-1:60101 
656G    ./instance_udc-an40-17:60101 
541G    ./instance_udc-an25-24:60101 
656G    ./instance_udc-an38-1:60101 
886G    ./instance_udc-ba02-38:60101 
541G    ./instance_udc-ba07-35:60101 
0       ./instance_udc-an34-19:60101 
656G    ./instance_udc-ba04-38:60101 
656G    ./instance_udc-an34-25:60101 
0       ./instance_udc-an37-31:60101 
656G    ./instance_udc-an36-1:60101 
4.1T    ./instance_udc-an33-12c0:60101 
541G    ./instance_udc-an38-29:60101 
13T .

I looked in the docs and added the following to cryosparc_worker/config.sh in the hopes that workers would all share the same dir again:

export CRYOSPARC_SSD_PATH=/scratch/npa2pc/cryosparc_cache/instance_udc-an33-12c0:60101

but I’m still seeing multiple directories created for each worker hostname.

This is a typical lane config:

slurm_lane_gpu_a100_config$ cat cluster_info.json
{  “name”: “slurm-gpu-a100”,  “worker_bin_path”: “/standard/takcryoem/cryosparc/cryosparc_worker/bin/cryosparcw”,  “cache_path”: “/scratch/npa2pc/cryosparc_cache”,  “send_cmd_tpl”: “{{ command }}”,  “qsub_cmd_tpl”: “/opt/slurm/current/bin/sbatch {{ script_path_abs }}”,  “qstat_cmd_tpl”: “/opt/slurm/current/bin/squeue -j {{ cluster_job_id }}”,  “qdel_cmd_tpl”: “/opt/slurm/current/bin/scancel {{ cluster_job_id }}”,  “qinfo_cmd_tpl”: “/opt/slurm/current/bin/sinfo”}

Help?

Welcome to the forum @cameronf . Have you confirmed that

  1. There aren’t separate CryoSPARC master installations running on the various udc-an?? and udc-ba?? nodes?
  2. That the /scratch/npa2pc/cryosparc_cache filesystem has significantly faster read performance than the filesystem that hosts CryoSPARC project directories?

Yes - I’ve confirmed there is only one master installation, and I’ve also watched it create new directories under cryosparc_cache every time a new job runs, and it creates them with the instance name of the compute node running the job.

I’ve also confrimed that /scratch is significantly faster (and my IT team specifically asked me to use /scratch)

@cameronf Thanks for these checks. Please can you post the output of the commands

grep -v LICENSE /path/to/cryosparc_master/config.sh
grep cryosparc ~cryosparcuser/.bashrc # replace cryosparcuser with Linux user that owns CryoSPARC processes

Heh - glad I noticed that -v I was like - I don’t think I should post my license in here :wink:

cryosparc_master$ grep -v LICENSE config.sh

# Instance Configuration

# export CRYOSPARC_MASTER_HOSTNAME=“udc-ba38-32c0”

export CRYOSPARC_MASTER_HOSTNAME=$(hostname)

export CRYOSPARC_DB_PATH=“/home/npa2pc/cryosparc/cryodatabase”

# export CRYOSPARC_DB_PATH=“/standard/takcryoem/cryosparc/cryodatabase”

export CRYOSPARC_BASE_PORT=60100

export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000

export CRYOSPARC_MONGO_CACHE_GB=4

# Security

export CRYOSPARC_INSECURE=false

export CRYOSPARC_DB_ENABLE_AUTH=true

export CRYOSPARC_FORCE_USER=true

# Cluster Integration

export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10

export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000

# Project Configuration

export CRYOSPARC_PROJECT_DIR_PREFIX=‘CS-’

# Development

export CRYOSPARC_DEVELOP=false

# Other

export CRYOSPARC_CLICK_WRAP=true

.bashrc:

export PATH=“/sfs/ceph/standard/takcryoem/cryosparc/cryosparc_master/bin”:$PATH

This definition could play a part in incorrectly defining CRYOSPARC_MASTER_HOSTNAME in terms of a worker’s hostname. Until we implement a guard against the incorrect definition in a future CryoSPARC release, one may work around this problem by defining CRYOSPARC_MASTER_HOSTNAME “statically” (in terms of an actual hostname string instead of hostname command output).

Enabling CRYOSPARC_FORCE_USER increases the risk of inconsistent file ownerships and consequent disruptions.

Interesting - that makes sense I suppose. The problem is my “master” node changes somewhat frequently because of the compute environment I’m stuck in - so the $(hostname) was a way to deal with that - I’ll figure out how to hardcode it going forward.

Thanks!

Hi @cameronf, we made some changes in the latest v5.0.3 update that should no longer require defining CRYOSPARC_MASTER_HOSTNAME statically in cryosparc_master/config.sh. Please let us know if you run into further cache or hostname-related issues after updating!