Moving motion-corrected files to NAS

Hello!

Due to limited space on my SSD, I would like to move motion-corrected files from the ‘J3’ directory to a NAS or HDD to free space. How can I link the files so that cryoSPARC can still read them as if they were never moved?

Currently I’m on the NU-refinement step.

Unfortunately, neither of the links below seems suitable

  1. Guide: Migrating your CryoSPARC Instance | CryoSPARC Guide

  2. Guide: Data Management in CryoSPARC (v4.0+) | CryoSPARC Guide

Thank you!

Hello wtkscn,

I’ve never done that, but this is what I’d try first: if the CS worker can see your NAS, you can probably mount the NAS directory somewhere in the worker and replace the image files inside the J3 directory in the worker by symbolic links pointing to the corresponding ones in the NAS - you’ll need a short bash line to make all of them become links with ‘ln -s’.

  • This is the part I’m not sure about, is if symbolic links pose any kind of trouble for the CS routine. Maybe try first with a small subset of images…

Keep in mind that, in such a scenario, all re-extraction jobs will probably take much longer to complete, no matter if HDD or NAS.

Thanks for the reply.

I wasn’t sure if standard linux links will work here, or if I have to link the files using cryosparcm icli. I’ll test the ln -s on a small subset and let know here.

Also, I don’t think that the motion-corrected micrographs are needed at this stage of data processing. There should be no need for a fast access anymore.

All recommendations about this particular problem but also data management are welcome!

Well, ideally we should be able to keep everything fast access until the end of processing, the non-aligned movies - eventually for local motion correction at the end - and the aligned movies, for re-extractions - either if you want to change the pixel size or simply to get the XY centering corrected. My biggest challenge is to realize when “end of processing” is…

Symbolic links will work.

Alternatively you can use the Clear action to delete the data but preserve the links between jobs, and then just re-queue the job if you end up needing the data again. This approach may be better for particle stacks than micrographs, as motion correction likely has some nondeterminism.

Hi! Thanks for suggestions.

So I have run a few tests on a small subset and a full set of micrographs, and:

  1. Cryosparc could read and perform Patch CTF on the small subset of 40 micrographs that were moved from SSD to NAS and linked using ln -s

  2. Cryosparc could not read the full set of 11k micrographs that were moved from SSD to NAS and symlinked using ln -s. I had to couple the ln -s with xargs to link all 11k files at once. The symlinks are in the correct folder and lead to the correct pathway of targets.

It gives this error when I run Patch CTF:

Error occurred while processing J3/motioncorrected/micrograph1.mrc
Traceback (most recent call last):
File “/home/user/cryosparc/cryosparc_worker/cryosparc_compute/blobio/mrc.py”, line 188, in read_mrc
data = read_mrc_data(file_obj, header, start_page, end_page, out)
File “/home/user/cryosparc/cryosparc_worker/cryosparc_compute/blobio/mrc.py”, line 104, in read_mrc_data
data = n.fromfile(file_obj, dtype=dtype, count= num_pages * ny * nx).reshape(num_pages, ny, nx)
ValueError: cannot reshape array of size 23569152 into shape (1,4092,5760)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/home/user/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py”, line 59, in exec
return self.process(item)
File “cryosparc_master/cryosparc_compute/jobs/ctf_estimation/run.py”, line 101, in cryosparc_master.cryosparc_compute.jobs.ctf_estimation.run.run.ctfworker.process
File “cryosparc_master/cryosparc_compute/jobs/ctf_estimation/run.py”, line 104, in cryosparc_master.cryosparc_compute.jobs.ctf_estimation.run.run.ctfworker.process
File “/home/user/cryosparc/cryosparc_worker/cryosparc_compute/blobio/mrc.py”, line 190, in read_mrc
raise ValueError(f’Could not read mrc data from {fname}') from e
ValueError: Could not read mrc data from /mnt/SSD/cryosparc-projects/cryoem-project/J3/motioncorrected/micrograph1.mrc

Marking J3/motioncorrected/micrograph1.mrc as incomplete and continuing…

When the micrographs were moved back to their original location on SSD, patch CTF run as normal.

  1. When I run “Check for Corrupt Micrographs” it says: Failed to read exposure.

At this point I’m not sure whether the solution is more on the side of Linux or cryosparc.

I solved it by archiving the project, changing the location of the project from SSD to HDD, and unarchiving the project with an updated path.

Motion-corrected micrographs are not needed further in the process so it doesn’t interfere with data processing.

Thanks for your suggestions!