In v0 we would use “cryosparc configure cache clear” to clear the cache. Is there a corresponding command in v2?
Thanks!
In v0 we would use “cryosparc configure cache clear” to clear the cache. Is there a corresponding command in v2?
Thanks!
I would also really like to be able to skip caching entirely!
There is really no benefit to caching for datasets smaller than 40% the system memory, and the benefit is negligible for moderately larger datasets. My “cache” is just another location on the same (6-HDD RAID 10) filesystem, so the copying is just wasted time and space for me. Attempting a reflink copy would be a great alternative, too.
If my guesses are correct, I just did it:
cd yyyy/instance_xxx:39001
cd P6
rm * -rf
The xxx and yyyy in “yyyy/instance_xxx:39001” seems to be determined by the cryosparcw start command in
–worker xxx --ssdpath yyyy
After deleting mine, only one running job died, not a surprise. New jobs will remake their cache. The files there are just an extra copy of the .mrc files. The original copies are in the project directory. In my case both copies actually stay on the same SSD.
Judging by the fact that the cache’s location is determined by the --ssd option, I guess the idea is that we can put projects on our slow HDDs, and only use the SSD as cache during the computation. I think that procedure still makes sense for people with only one SSD in the system. It is actually quite considerate for us the poor.
But yes, a switch will be nice. (or add a check step to see if the cache dir is on the same physical storage as the original?)
Ah, there is a switch already (but we probably need to click it for every job, so maybe leaving it on is less work):
Cache particle images on SSD
Whether or not to copy particle images to the local SSD before running. The cache is persistent, so after caching once, particles should be available for subsequent jobs that require the same data. Not using an SSD can dramatically slow down processing.
Thanks for your response! I realize that I could delete the cache files outside of cryoSPARC but, like you said, that could cause some jobs to be killed. I would rather have cryoSPARC clear the cache since it knows which cache files are safe to delete and won’t interfere with any running jobs.
Hi all,
Currently in v2 there is no command to clear the cache intelligently (i.e. not causing running jobs relying on the cache to fail). This will be added in the future, though is somewhat complex because each worker node or cluster target has a separate SSD cache. The option to disable caching entirely on a particular target/cluster/project will also be added in the future, for now the SSD cache does need to be disabled for each job individually.
@apunjani Is it safe to manually delete cache files for completed or unqueued/cleared jobs? Also, can I pre-cache with symlinks to skip caching? Thanks.
Yes safe to delete cache files any time (it’s a read-only cache) and yes, the cache checks to see if files exist just based on path/size/modification date so symlinks should cause it to skip. Though may be easier to just set the SSD Cache parameter to False in each job that you queue up?
Thanks, yes that’s ideal! The parameter was missing or not exposed in some earlier versions of cryoSPARC 2, I didn’t realize it had been re-introduced.
Yes sorry it was only in 2.0.27 - I believe in the next release we’re adding a project-level param for this as well that should make it much easier to skip caching
Hi @DanielAsarnow,
in v2.2 you can now use the following command to set a project-level default for the SSD cache parameter:
cryosparcm cli 'set_project_param_default("PX","compute_use_ssd",False)'
And undo the default like so:
cryosparcm cli 'unset_project_param_default("PX","compute_use_ssd")'
Ali
Nice!
It’s especially good for our cryoSPARC dedicated workstation which does have a SSD cache, but also a RAID 10 that holds some data. I have our lab members use the SSD cache for remote data, to avoid crushing our 10G network, but skip it for local data.