I am having trouble with a ‘requested files are locked’ error whenever I try to run any job with cache particle images on SSD (on different gpus) - for jobs like 3DVA, there is no way I can run without caching as it will take weeks. No idea why this is happening - I’ve attempted solutions from now quite old posts about similar (though not identical) errors, rerunning jobs without caching and using those outputs, as well as running the jobs from a new workspace.
Any reasons/solutions/ideas anyone could offer for this would be much appreciated.
All the best and many thanks,
Billy
See below:
[CPU: 1.03 GB] Reading particle stack…
[CPU: 1.06 GB] Windowing particles
[CPU: 1.06 GB] Done.
[CPU: 1.06 GB] SSD cache : cache successfuly synced in_use
[CPU: 1.06 GB] SSD cache : cache successfuly synced, found 0.00MB of files on SSD.
[CPU: 1.06 GB] SSD cache : requested files are locked for past 266s, checking again in 5s
Have you tried deleting the cache/directory? Are permissions correct for the user running cryoSPARC? The fact that it is finding 0 MB of files… it either can’t create the cache or can’t read what is already there. Check dmesg to see if there have been any problems with the SSD drive which may have forced a remount as read only, or if the filesystem has been corrupted.
If the particles are being written elsewhere, even on different project/workspace/computation node etc, then they cannot be accessed. this is the intended function and thus it waits until that operation is finished and then resumes.
In a related but not identical situation, we are finding that cryosparc is not able to clear the cache at the beginning of a job. If files are manually deleted then the job will proceed, but otherwise it gets stuck:
[CPU: 2.85 GB] Loading a ParticleStack with 598297 items…
[CPU: 2.85 GB] SSD cache : cache requires 665525.66MB more on the SSD for files to be downloaded.
[CPU: 2.85 GB] SSD cache : cache may not have enough space for download
[CPU: 2.85 GB] SSD cache : there are older cache entries that can be deleted, deleting…
[CPU: 2.85 GB] SSD cache : cache may not have enough space for download
[CPU: 2.85 GB] SSD cache : There are no files that can be deleted at cache location /local_scratch/cryosparc3/tmp
[CPU: 2.85 GB] SSD cache : This could be because other jobs are running and using files, or because a different program has used up space on the SSD.
[CPU: 2.85 GB] SSD cache : Cache full for past 14222s, checking again in 30 seconds for space to become available…
To be clear, no-one was acessing those pre-cached files in any program when this job was started.
Is it possible that there is some sort of permissions issue preventing cryosparc from deleting these files automatically? or is it a bug of this version 3.3.2?
If a message like this appears while there are no jobs running on your instance, your caching system is in an inconsistent state. A recovery procedure is described elsewhere on this forum.
Did you confirm that the volume that holds /local_scratch/cryosparc3/tmp is large enough to hold those
665+ GBs plus any data also residing on that same volume but outside/local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))?
I found cached files in /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1)) and in /local_scratch/cryosparc3/tmp
Outside of those two folders, there is essentially no disk usage.
These cached files are not being deleted by cryosparc when it finds there isn’t enough free space.
What is the size of the data in /local_scratch/cryosparc3/tmp, not counting data in /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))?
I am concerned a given cryoSPARC instance can not be counted on for managing data outside /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1)).
Were any values specified for --ssdreserve and --ssdquota during connection of the worker node?
Was the manual file deletion performed by the Linux user/account that runs the cryoSPARC instance?
At the time of this issue, I think there was around 300GB in each of /local_scratch/cryosparc3/tmp and /tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))
How can I find the specification for --ssdreserve and --ssdquota for the worker node?
Do you remember if the 300 GB in /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))/
were included in the 300 GB inside /local_scratch/cryosparc3/tmp/
or was there a total of approx. (300 + 300 = 600) GB inside /local_scratch/cryosparc3/tmp/ and its subdirectories?
I wonder how the files outside /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))/ got there.
The --ssdreserve and --sdquota connection parameters would be shown as Cache Reserve (MB) and Cache Quota (MB), respectively, under Resource Manager/Instance Information.
I wonder if previous versions of cryosparc used a different cache folder and we just didn’t delete those folders? The issue of cache not clearing is recurring with different users on the same cluster (note OP bfisher is on the same cluster).
There isn’t a cache quota set and the cache reserve is 1000 Mb per node. Do these need to be changed?