Requested files are locked (when caching)

bfisher · June 12, 2022, 12:18am

Hi cryosparc community,

I am having trouble with a ‘requested files are locked’ error whenever I try to run any job with cache particle images on SSD (on different gpus) - for jobs like 3DVA, there is no way I can run without caching as it will take weeks. No idea why this is happening - I’ve attempted solutions from now quite old posts about similar (though not identical) errors, rerunning jobs without caching and using those outputs, as well as running the jobs from a new workspace.

Any reasons/solutions/ideas anyone could offer for this would be much appreciated.

All the best and many thanks,

Billy

See below:

[CPU: 1.03 GB] Reading particle stack…
[CPU: 1.06 GB] Windowing particles
[CPU: 1.06 GB] Done.
[CPU: 1.06 GB] SSD cache : cache successfuly synced in_use
[CPU: 1.06 GB] SSD cache : cache successfuly synced, found 0.00MB of files on SSD.
[CPU: 1.06 GB] SSD cache : requested files are locked for past 266s, checking again in 5s

rbs_sci · June 13, 2022, 1:23am

Have you tried deleting the cache/directory? Are permissions correct for the user running cryoSPARC? The fact that it is finding 0 MB of files… it either can’t create the cache or can’t read what is already there. Check dmesg to see if there have been any problems with the SSD drive which may have forced a remount as read only, or if the filesystem has been corrupted.

zyx · June 22, 2022, 5:00am

hi I have the same problem,have you figure it out?

[CPU: 2.76 GB] Using random seed of 1806626834

[CPU: 2.76 GB] Loading a ParticleStack with 3392988 items…

[CPU: 2.76 GB] SSD cache : cache successfuly synced in_use

[CPU: 2.76 GB] SSD cache : cache successfuly synced, found 34287.86MB of files on SSD.

[CPU: 2.76 GB] SSD cache : requested files are locked for past 363s, checking again in 5s

CryoEM1 · June 22, 2022, 12:00pm

If the particles are being written elsewhere, even on different project/workspace/computation node etc, then they cannot be accessed. this is the intended function and thus it waits until that operation is finished and then resumes.

Hbridges · July 1, 2022, 3:49pm

In a related but not identical situation, we are finding that cryosparc is not able to clear the cache at the beginning of a job. If files are manually deleted then the job will proceed, but otherwise it gets stuck:

[CPU: 2.85 GB] Loading a ParticleStack with 598297 items…

[CPU: 2.85 GB] SSD cache : cache successfuly synced in_use

[CPU: 2.85 GB] SSD cache : cache successfuly synced, found 31314.39MB of files on SSD.

[CPU: 2.85 GB] SSD cache : cache successfuly requested to check 599 files.

[CPU: 2.85 GB] SSD cache : cache requires 665525.66MB more on the SSD for files to be downloaded.

[CPU: 2.85 GB] SSD cache : cache may not have enough space for download

[CPU: 2.85 GB] SSD cache : there are older cache entries that can be deleted, deleting…

[CPU: 2.85 GB] SSD cache : cache may not have enough space for download

[CPU: 2.85 GB] SSD cache : There are no files that can be deleted at cache location /local_scratch/cryosparc3/tmp

[CPU: 2.85 GB] SSD cache : This could be because other jobs are running and using files, or because a different program has used up space on the SSD.
[CPU: 2.85 GB] SSD cache : Cache full for past 14222s, checking again in 30 seconds for space to become available…

To be clear, no-one was acessing those pre-cached files in any program when this job was started.

Is it possible that there is some sort of permissions issue preventing cryosparc from deleting these files automatically? or is it a bug of this version 3.3.2?

Cheers

Hannah

wtempel · July 4, 2022, 3:32pm

@bfisher @zyx

If a message like this appears while there are no jobs running on your instance, your caching system is in an inconsistent state. A recovery procedure is described elsewhere on this forum.

wtempel · July 4, 2022, 3:44pm

Did you confirm that the volume that holds /local_scratch/cryosparc3/tmp is large enough to hold those
665+ GBs plus any data also residing on that same volume but outside /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))?

Hbridges · July 4, 2022, 5:58pm

The total SSD on this node is is 881Gb.

I found cached files in /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1)) and in /local_scratch/cryosparc3/tmp

Outside of those two folders, there is essentially no disk usage.

These cached files are not being deleted by cryosparc when it finds there isn’t enough free space.

wtempel · July 4, 2022, 6:43pm

What is the size of the data in /local_scratch/cryosparc3/tmp, not counting data in
/local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))?
I am concerned a given cryoSPARC instance can not be counted on for managing data outside /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1)).

Were any values specified for --ssdreserve and --ssdquota during connection of the worker node?

Was the manual file deletion performed by the Linux user/account that runs the cryoSPARC instance?

Hbridges · July 6, 2022, 10:59am

Manual deletion was performed by root user.

At the time of this issue, I think there was around 300GB in each of /local_scratch/cryosparc3/tmp and /tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))

How can I find the specification for --ssdreserve and --ssdquota for the worker node?

wtempel · July 7, 2022, 8:07pm

Do you remember if the 300 GB in
/local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))/
were included in the 300 GB inside
/local_scratch/cryosparc3/tmp/
or was there a total of approx. (300 + 300 = 600) GB inside /local_scratch/cryosparc3/tmp/ and its subdirectories?
I wonder how the files outside /local_scratch/cryosparc3/tmp/instance_${CRYOSPARC_MASTER_HOSTNAME}:$((CRYOSPARC_MONGO_PORT+1))/ got there.
The --ssdreserve and --sdquota connection parameters would be shown as Cache Reserve (MB) and Cache Quota (MB), respectively, under Resource Manager/Instance Information.

Hbridges · August 2, 2022, 3:43pm

Hi there,

I wonder if previous versions of cryosparc used a different cache folder and we just didn’t delete those folders? The issue of cache not clearing is recurring with different users on the same cluster (note OP bfisher is on the same cluster).

There isn’t a cache quota set and the cache reserve is 1000 Mb per node. Do these need to be changed?

wtempel · August 26, 2022, 9:22pm

How do different users access cryoSPARC on this cluster:

run separate, user-owned “cryoSPARC instances as cluster jobs” -or-
access the UI of a multi-user cryoSPARC instance under a common ${CRYOSPARC_MASTER_HOSTNAME}:${CRYOSPARC_BASE_PORT}$ url

?

donghuachen · January 11, 2023, 7:59pm

I had the same problem with v4.1.1. Resetting the particle caches suggested by @wtempel solved the problem!