Automating cleanup of database /tmp directory

Hi CryoSPARC team,

I have a question about disk usage and cleanup related to the $database_path/tmp directory in the database (not inside the individual project containers). It looks like this path is passed along to “–ssdpath”. Is there a recommended or supported way to clean up this directory automatically or periodically? Any guidance or best practices would be greatly appreciated, especially for long-running installations where database usage can quietly grow over time.

Thanks!

Yue

@YueY2017 To enable us to make suggestions relevant to your setup, please can you post the outputs of these commands:

cryosparcm status | grep -e HOST -e PATH
cryosparcm cli "get_scheduler_targets()"

@wtempel thanks! Here is:

/app/cryosparc_master/bin/cryosparcm status | grep -e HOST -e PATH
CRYOSPARC_MASTER_HOSTNAME=${CRYOSPARC_MASTER_HOSTNAME:-localhost}
export CRYOSPARC_DB_PATH=${CRYOSPARC_DATADIR}/cryosparc2_database
export CRYOSPARC_FORCE_HOSTNAME=true

CRYOSPARC_MASTER_HOSTNAME should be set to 'utility-01' and CRYOSPARC_DB_PATH is /hpc/mydata/svc.cryosparc/cryosparc-v2/cryosparc_database

And here is the output for "get_scheduler_targets()":

/app/cryosparc_master/bin/cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/tmp/cryosparc', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'gpu_a40', 'lane': 'gpu_a40', 'name': 'gpu_a40', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': 'squeue -j {{ cluster_job_id }} --format=%T | sed -n 2p', 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --partition=interactive\n#SBATCH --constraint=a40\n#SBATCH --mem={{ (ram_gb*1000)|int }}MB\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n\navailable_devs=""\nfor devidx in $(seq 0 15);\ndo\n if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then\n if [[ -z "$available_devs" ]] ; then\n available_devs=$devidx\n else\n available_devs=$available_devs,$devidx\n fi\n fi\ndone\nexport CUDA_VISIBLE_DEVICES=$available_devs\n\nexport CRYOSPARC_WORKER_MODULE=cryosparc/4.7.1-motioncor2-1.6.4\nexport CRYOSPARC_MASTER_HOSTNAME=utility-01\n\nmodule load $CRYOSPARC_WORKER_MODULE\n\ncryosparc-worker {{ run_cmd }}\n', 'send_cmd_tpl': 'ssh login {{ command }}', 'title': 'gpu_a40', 'tpl_vars': ['run_cmd', 'cluster_job_id', 'project_uid', 'num_gpu', 'job_uid', 'job_log_path_abs', 'command', 'num_cpu', 'ram_gb'], 'type': 'cluster', 'worker_bin_path': '/app/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/tmp/cryosparc', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'gpu_any', 'lane': 'gpu_any', 'name': 'gpu_any', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': 'squeue -j {{ cluster_job_id }} --format=%T | sed -n 2p', 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --partition=gpu\n#SBATCH --mem={{ (ram_gb*1000)|int }}MB\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n\navailable_devs=""\nfor devidx in $(seq 0 15);\ndo\n if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then\n if [[ -z "$available_devs" ]] ; then\n available_devs=$devidx\n else\n available_devs=$available_devs,$devidx\n fi\n fi\ndone\nexport CUDA_VISIBLE_DEVICES=$available_devs\n\nexport CRYOSPARC_WORKER_MODULE=cryosparc/4.7.1-motioncor2-1.6.4\nexport CRYOSPARC_MASTER_HOSTNAME=utility-01\n\nmodule load $CRYOSPARC_WORKER_MODULE\n\ncryosparc-worker {{ run_cmd }}\n', 'send_cmd_tpl': 'ssh login {{ command }}', 'title': 'gpu_any', 'tpl_vars': ['run_cmd', 'cluster_job_id', 'project_uid', 'num_gpu', 'job_uid', 'job_log_path_abs', 'command', 'num_cpu', 'ram_gb'], 'type': 'cluster', 'worker_bin_path': '/app/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/tmp/cryosparc', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'gpu_h100-h200', 'lane': 'gpu_h100-h200', 'name': 'gpu_h100-h200', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': 'squeue -j {{ cluster_job_id }} --format=%T | sed -n 2p', 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/bin/bash\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --partition=gpu\n#SBATCH --constraint=[h100|h200]\n#SBATCH --mem={{ (ram_gb*1000)|int }}MB\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n\navailable_devs=""\nfor devidx in $(seq 0 15);\ndo\n if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then\n if [[ -z "$available_devs" ]] ; then\n available_devs=$devidx\n else\n available_devs=$available_devs,$devidx\n fi\n fi\ndone\nexport CUDA_VISIBLE_DEVICES=$available_devs\n\nexport CRYOSPARC_WORKER_MODULE=cryosparc/4.7.1-motioncor2-1.6.4\nexport CRYOSPARC_MASTER_HOSTNAME=utility-01\n\nmodule load $CRYOSPARC_WORKER_MODULE\n\ncryosparc-worker {{ run_cmd }}\n', 'send_cmd_tpl': 'ssh login {{ command }}', 'title': 'gpu_h100-h200', 'tpl_vars': ['run_cmd', 'cluster_job_id', 'project_uid', 'num_gpu', 'job_uid', 'job_log_path_abs', 'command', 'num_cpu', 'ram_gb'], 'type': 'cluster', 'worker_bin_path': '/app/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/tmp/cryosparc', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'hostname': 'utility-01', 'lane': 'default', 'monitor_port': None, 'name': 'utility-01', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], 'GPU': [], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'svc.cryosparc@utility-01', 'title': 'Worker node utility-01', 'type': 'node', 'worker_bin_path': '/app/cryosparc_worker/bin/cryosparcw'}]

Just to confirm:

  1. Are you referring to the path
    /hpc/mydata/svc.cryosparc/cryosparc-v2/cryosparc_database/tmp/
    
    (which I would not expect to normally exist)?
  2. Or are you referring to the /tmp/cryosparc path that is assigned to the cache_path variable of your scheduler targets? In this case, please can you post the output of the commands
    df -hT /tmp/cryosparc
    du -sh /tmp/cryosparc/*
    ls -al /tmp/cryosparc
    
    of a representative CryoSPARC worker node where you see the need for a manual cleanup?
  3. Or are you referring to any other path?