SSD cache waiting to be unlocked & optimized performance

CleoShen · April 22, 2021, 9:07pm

Hi nfrasser,

I have the similar trouble when running 2d classification after I updated the CS from 3.1 to 3.2.

Best,
Chuchu

kiwhite · April 24, 2021, 12:14am

Hi @nfrasser,

I too am having this problem with 3.2. In my case it basically happens with any job type I try.

Thanks so much,
KIW

nfrasser · April 26, 2021, 2:44pm

Hi @kiwhite @MHB we recently released a patch for v3.2 with a fix for this. Please install it with the following instructions: https://guide.cryosparc.com/setup-configuration-and-management/software-updates#apply-patches

@CleoShen this means there is not enough space on the SSD to cache the required particles. You may have to either (A) manually free up space used by other applications, (B) reconfigure your installation to use a larger SSD or ( C ) disable SSD caching in the job parameters before queuing it.

MHB · April 26, 2021, 3:12pm

Great thanks! Will install and see how things work.

CleoShen · April 26, 2021, 5:58pm

Hi nfrasser,

Thank you for your kindly reply. I selected C solution and the SSD cache erre was fixed, after I updating the patched, the 2D classification has new error, as below; do you have any suggestions?

kiwhite · April 27, 2021, 3:32pm

@nfrasser okay great, that did the trick. Thanks so much!

donghuachen · May 16, 2021, 12:44am

@nfrasser I still have this problem with v3.2.0 for 2D classification.

License is valid.
Launching job on lane default target localhost …
Running job on master node hostname localhost
[CPU: 69.7 MB] Project P17 Job J8 Started
[CPU: 69.7 MB] Master running v3.2.0, worker running v3.2.0
[CPU: 69.7 MB] Running on lane default
[CPU: 69.7 MB] Resources allocated:
[CPU: 69.7 MB] Worker: localhost
[CPU: 69.7 MB] CPU : [0, 1]
[CPU: 69.7 MB] GPU : [0, 1]
[CPU: 69.7 MB] RAM : [0, 1, 2]
[CPU: 69.7 MB] SSD : True
[CPU: 69.7 MB] --------------------------------------------------------------
[CPU: 69.7 MB] Importing job module for job type class_2D…
[CPU: 196.0 MB] Job ready to run
[CPU: 196.0 MB] ***************************************************************
[CPU: 518.2 MB] Using random seed of 777984025
[CPU: 518.3 MB] Loading a ParticleStack with 481295 items…
[CPU: 522.1 MB] SSD cache : cache successfuly synced in_use
[CPU: 522.1 MB] SSD cache : cache successfuly synced, found 0.00MB of files on SSD.
[CPU: 522.1 MB] SSD cache : requested files are locked for past 400s, checking again in 5s

nfrasser · May 17, 2021, 4:37pm

@donghuachen did you also apply the patch? SSD cache waiting to be unlocked & optimized performance

donghuachen · May 19, 2021, 7:31pm

@nfrasser I am not sure about the patch application because the cluster admin did the installation of v3.2.0.

nfrasser · May 19, 2021, 8:55pm

You can tell whether the patch is installed if you see +210511 after the version number in cryoSPARC’s dashboard:

CD1 · July 8, 2021, 11:12am

Hi,

I am currently trouble shooting an issue similar to the one discussed above ( in cryosparc v2.15). I have to say that cryosparc ran very smoothly over long time (>1 year) and only recently started to have these weird cache sync issues. I therefore suspect a local hardware issue (they pop up from time to time) or a problem related to the DB. Interestingly, we can reproduce this issue by running a NU-refinement which seems to break caching for an entire project. If we use a fresh project, everything runs fine until the point we run the NU again.

My question is: is it save to clear the cache_files collection in the meteor DB (and does it make sense)? It already collected ~5 million entries of files that are definitely not there anymore. I would of course also physically clear the cache SSD as well.

Upgrading cryosparc is not an intimidate option as we are planning to migrate to different hardware soon (and it seems like these issues still persist later on). I am rather looking for a temporary fix.

Best,
Chris

stephan · July 9, 2021, 3:57pm

Hi @CD1

Thanks for your insight.
I’m not sure why Non-Uniform Refinement would be causing these issues- the same code path is used in all jobs when caching particles. Either way, I’ll take a look.

You can definitely do this as long as you’ve cleared all the files the actual cache location to avoid duplication. I’d also recommend creating a backup of your database just incase: cryosparcm reference | CryoSPARC Guide
Run the following commands in a shell on the master node to drop the cache_files collection:

eval $(cryosparcm env)
cryosparcm mongo
> db.cache_files.drop()

Once that’s done, restart cryoSPARC (cryosparcm restart) to recreate the cache_files collection and rebuild the indexes.

kpahil · March 7, 2022, 10:00pm

Hi @stephan ,

I’m on v3.3.1+211214 and seem to be having a similar problem…

Do you have any idea what could be going on/ if this might be a similar bug? Sometimes this has occurred when no other cryosparc job was running on the compute node in question

Thank you!

rssr · March 8, 2022, 12:53pm

Hi,

I am experiencing the same issue as well with v3.3.1

Would be nice to get some suggestions