Hello,
We have cryosparc installed on our HPC. After cryosparc crashed, the admins had to roll it back to a previous back-up and we had to restart our projects. First, we tried to re-attach projects by deleting the lock file. But then we ran into FileExists errors:
"[CPU: 200.4 MB]
Traceback (most recent call last): File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 52, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
File “/scratch/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/os.py”, line 223, in makedirs mkdir(name, mode)
Later, when I tried to start a new project, I would run into FileExists Errors when trying to motion-correct on imported movies or CTF estimate imported motion corrected images.
Another user of the HPC said that he can get a job working without FileExists Error by submitting multiple of dozens of jobs to fail until a job number is created that did not exist before.
Can anyone advise on how we can troubleshoot this error?
What was the state of the database after the restoration of the backup and before the re-attachment of the specific project where the FileExistsError occurred. Had the database been rolled back to a backup that was created before that project was created?
It’s not the same directory path. The FileExists error occurs when continuing to work within a project under two distinct circumstances:
A new project with a new name is created (creating a new directory path) and the old project is detached (cs.lock deleted) and attached to the new project
A new project is created that imports the data from a directory that a previous project also imported the same data from
To avoid any confusion: Removal of cs.lock and project detachment are not equivalent; the former is only part of the latter. Manual removal of cs.lock is not recommended outside exceptional and specific circumstances.
Please can you elaborate on the specific steps that were performed in the process?
(A project (directory) is meant to be attached to an instance, not to an existing project. Please see the guide for details).
As the term import has different meanings in various CryoSPARC contexts, please can you describe the data import in the current context.
Cryosparc on our HPC is crashing very frequently so that we are now implementing a twice a day back-up schedule.
When Cryosparc first crashed, our HPC admin team told us (despite the instructions on the cryosparc webpage) that it was safe to manually delete the cs.lock file and attach projects to new projects and that they didn’t see any negative impact of it.
To attach projects, users manually deleted the cs.lock file in previous projects, selected “Attach project” from the “New Project” drop-down menu, and selected the project directory that they wanted to be imported.
(One of the reasons why we didn’t properly detach projects was also because after the roll-back, projects were not visible in the “All Projects” page. Thus, we couldn’t click on “…” on a project and select “Detach Project”. So we defaulted to what the HPC admin advised us to do.)
When continuing to work on these attached projects, using CTF estimation for instance, we ran into the FileExists error.
Alternatively, we tried to just start a new project and import the original data by selecting and creating a “New Project” and performing the “Import” job to import movies. Note that this data was imported previously using the import job to previous projects that became inaccessible after the roll-back. When doing motion correction on these movies, we again ran into the FileExists error.
specific symptoms and log messages associated with crashes
we may be able to suggest modifications to make your CryoSPARC instance more stable. It would be great if your HPC team participated in this discussion.
Would it be possible for your, by inspection of attributes (like creation date) and contents of /scratch/cryosparcuser/username/projectname/J2,
to recreate the sequence of events (like database failures, restoration of outdated backups, removal of project locks, etc.) that led to the FileExistsError?
This exercise may help to:
Could it be that projects were not visible because the user accessing that page was not logged on under an admin-level CryoSPARC account, or that restoration of the database failed?