“cache waiting for requested files to become unlocked”
I know this is a common problem but it’d be really cool if it over-rides this error after a while and skips the SSD. Because often I’ll submit jobs at night and wake up and they will still not have started.
Also we are using a cluster with gpu boxes, but we’ve noticed that if 4 people submit jobs then usually they will all slow to a near halt. Especially if it’s an intense job like NU refinement. Is this an issue w/ cryosparc or is it the way we have our resources set up? Again, I’ll submit jobs at night but wake up and they are all “running” but still on iteration 0. Ugh.
There is a bug where this happens endlessly in a specific case, but we’ve fixed it. The fix will be included in the next version of cryoSPARC (coming soon!).
Can you ensure that the workstation these jobs are being assigned to have enough RAM available? Sometimes the system swaps memory if it doesn’t have enough, and this takes a very long time to complete.
Also, during testing we came across a similar issue where there was a massive slowdown with jobs running at the same time, and we noticed that the system RAM was filled with OS filesystem cache.
We executed sync; echo 1 > /proc/sys/vm/drop_caches to drop the cache, and all jobs continued processing normally.
Note that you will still see the “cache waiting” message if there are multiple jobs trying to cache the same files, but it should now proceed once another job finishes caching. Do not kill cache-waiting jobs while this is happening.
Recent upgrade to V3.0 and seeing a similar issue. Used a prior stack that i could load on the ssd and it just hangs like mentioned. Restarted cryosparc and now it thinks that the 100Gb stack is 3Tb!
****first run message888
[CPU: 81.6 MB] Importing job module for job type hetero_refine…
[CPU: 455.5 MB] Job ready to run
[CPU: 455.5 MB] ***************************************************************
[CPU: 651.7 MB] Using random seed of 329036760
[CPU: 651.7 MB] Loading a ParticleStack with 246037 items…
[CPU: 653.7 MB] SSD cache : cache successfuly synced in_use
[CPU: 653.7 MB] SSD cache : cache successfuly synced, found 0.00MB of files on SSD.
[CPU: 654.5 MB] SSD cache : requested files are locked for past 97s, checking again in 5s
*second run after restart
[CPU: 81.6 MB] --------------------------------------------------------------
[CPU: 81.6 MB] Importing job module for job type hetero_refine…
[CPU: 457.5 MB] Job ready to run
[CPU: 457.5 MB] ***************************************************************
[CPU: 653.7 MB] Using random seed of 1010616078
[CPU: 653.7 MB] Loading a ParticleStack with 246037 items…
[CPU: 655.8 MB] SSD cache : cache successfuly synced in_use
[CPU: 655.8 MB] SSD cache : cache successfuly synced, found 0.00MB of files on SSD.
[CPU: 658.9 MB] SSD cache : cache successfuly requested to check 16106 files.
[CPU: 663.1 MB] SSD cache : cache requires 3215182.60MB more on the SSD for files to be downloaded.
[CPU: 663.1 MB] SSD cache : cache does not have enough space for download
[CPU: 663.1 MB] SSD cache : but there are no files that can be deleted.
[CPU: 663.1 MB] SSD cache : This could be because other jobs are running and using files, or because a different program has used up space on the SSD.
[CPU: 663.1 MB] SSD cache : Waiting 30 seconds for space to become available…
Hi @MHB, are you still getting this consistently? If yes, to help me further troubleshoot, please try the following:
Paste this code into a text editor, replacing <PROJECT ID HERE> with the project ID of the cryoSPARC project where you’re seeing this (e.g., P3)
import os
from cryosparc_compute.dataset import Dataset
project_uid = '<PROJECT ID HERE>'
partiles_file_path = '<PASTE PATH HERE>'
project_dir = cli.get_project_dir_abs(project_uid)
partiles = Dataset().from_file(partiles_file_path)
paths = set(partiles.data['blob/path'])
with open('particle-sizes-output.txt', 'w') as f:
sizes = []
for path in paths:
abs_path = os.path.join(project_dir, path)
size = os.path.getsize(abs_path)
sizes += [size]
print(f"{path} {size}", file=f)
print(f"Total {sum(sizes)}", file=f)
In the cryoSPARC interface, open the parent job (e.g., Extract Particles) from which these particles come from (if there are multiple jobs do these steps for all of them)
Go the “Outputs” tab and look for the particles output group you connected to the hanging job
Copy the path to the particle.blob output from the cryoSPARC, as indicated
Also as a useful feature which has been suggested before it would be very helpful if there was a “clear cache” option. I can clear the cache myself but with multiple users there is no way to distinguish whose cache i am clearing. It would be helpful to have an automatic cache clear of a workspace after a certain period of inactivity or a radio button to clear the cache from a given workspace.
I am running into the large particle stack size issue again.
Running v3.1
A particle stack after 2D classification and selection give the following when trying to use the cache in a helical refnement
[CPU: 518.3 MB] Using random seed of 1054570268
[CPU: 518.3 MB] Loading a ParticleStack with 34559 items...
[CPU: 518.5 MB] SSD cache : cache successfuly synced in_use
[CPU: 519.3 MB] SSD cache : cache successfuly synced, found 166958.02MB of files on SSD.
[CPU: 519.3 MB] SSD cache : cache successfuly requested to check 6645 files.
[CPU: 521.6 MB] Detected file change due to change in file size.
[CPU: 520.1 MB] SSD cache : cache requires 1867125.75MB more on the SSD for files to be downloaded.
[CPU: 520.1 MB] SSD cache : cache does not have enough space for download
[CPU: 521.1 MB] SSD cache : but there are files that can be deleted, deleting...
[CPU: 521.6 MB] SSD cache : cache does not have enough space for download
[CPU: 520.4 MB] SSD cache : but there are no files that can be deleted.
[CPU: 520.4 MB] SSD cache : This could be because other jobs are running and using files, or because a different program has used up space on the SSD.
i.e. apparent size of 1.8Tb!
If re-extract the same particle stack with the same parameters now i get this
[CPU: 519.1 MB] Using random seed of 764820665
[CPU: 519.1 MB] Loading a ParticleStack with 34559 items...
[CPU: 519.2 MB] SSD cache : cache successfuly synced in_use
[CPU: 519.3 MB] SSD cache : cache successfuly synced, found 102962.49MB of files on SSD.
[CPU: 519.4 MB] SSD cache : cache successfuly requested to check 6645 files.
[CPU: 521.0 MB] SSD cache : cache requires 11871.38MB more on the SSD for files to be downloaded.
[CPU: 521.0 MB] SSD cache : cache has enough available space.
Now only 11Gb…reasonable size
I could not get the script you pasted above to run to complete the reqeusted diagnostic
Hi @MHB, please send me the following information for further troubleshooting:
How many particles went into the original 2D Classification job you used to filter these out?
Were the particles sourced from outside of cryoSPARC or via an “Import Particles Job”?
Were the particles extracted in a job that run the update from v3.1?
What is the size on disk of the “Extract from Micrographs” or “Import Particle Stack” job used to originally extract these particles (select the job and look in the sidebar)
Send me the .cs file for the outputs of the parent job where the refinements were sourced for helical refinement (e.g., the particles_selected output of Select 2D job). Download it from the job’s Outputs tab (see screenshot). Feel free to use a file-sharing service and direct-message me the link.
Hi All, I got the same issue with NU Refinement v3.1.0 and my job can’t run. I did not have other jobs running at the same time.
Importing job module for job type nonuniform_refine_new…
[CPU: 534.3 MB] Job ready to run
[CPU: 534.3 MB] ***************************************************************
[CPU: 1.18 GB] Using random seed of 1136935870
[CPU: 1.18 GB] Loading a ParticleStack with 547608 items…
[CPU: 1.19 GB] SSD cache : cache successfuly synced in_use
[CPU: 1.19 GB] SSD cache : cache successfuly synced, found 0.00MB of files on SSD.
[CPU: 1.19 GB] SSD cache : requested files are locked for past 709s, checking again in 5s