Hi all!
I have a basic question on the variable for the SSD cache lifetime (SSD_CACHE_LIFETIME_DAY).
In our instance we are submitting jobs to the HPC, the SSD is local to each compute node, and usually different jobs end up being scheduled to different nodes.
For our use case, we don’t need to store data in the cache for too long, but still copying the files there improves job run time by a lot.
Now, my question is: what is the behaviour of the cache if I set SSD_CACHE_LIFETIME_DAY=0 ?
Will the data in the cache be removed when the job is finished? Or will the data be removed as soon as another job needs the cache?
Then my question is: is there a way to not get any lifetime for the cache? So that there is no lock on the data stored in the cache once no jobs are using it?
You could create a cluster job-specific scratch directory and remove the scratch directory after the cluster job’s completion through your cluster’s workload manager.
For the slurm case, you could try (I have not yet):
define (using the $SLURM_JOB_ID), create and chown $SLURM_JOB_USER a job-specific scratch path as part of the cluster job Prolog
export that scratch path to the cluster job’s environment as part of the TaskProlog, as described in this example
in the cluster script, above the {{ run_cmd }} statement, assign the scratch path to the CRYOSPARC_SSD_PATH variable
remove the slurm job-specific scratch directory as part of the job Epilog
Thanks for your suggestion!
In the end I have set the CRYOSPARC_SSD_PATH variable to a temporary directory created and deleted by Slurm Prolog and Epilog.
Thank you very much for your availability
Have a nice day!