Moving a particle stack to a separate machine

I have recently gained access to a HPC cluster and would like to run some parts of data processing on it. Since storage space on the cluster is limited, my plan is to run preprocessing and particle picking on my single workstation setup, and only upload the particle stacks for 3D refinement. However, I cannot seem to figure out how to import the particles to the Cryosparc instance on the HPC. I am trying to transfer the exported jobs (I have tried this with an ‘Extract from micrographs’ and a ‘Restack particles’ job) using the command

rsync -rLtv ~/cryosparc/P1/exports/jobs/J1_extract_micrographs_multi user@cluster:~/P1/imports
rsync -rLtv ~/cryosparc/P1/exports/jobs/J2_restack_particles user@cluster:~/P1/imports

For some reason, this starts to transfer all the micrographs that the particles were extracted from, which is exactly what I am trying to avoid. When stopping rsync and trying to import the job without all the micrographs present, the import fails:

Unable to import job from /app/cryosparc_ datadir/P1-title/imports/J1_extract_ micrographs multi into P1:
[IMPORT_JOB]: Unable to find data referenced by cs files. Aborting import of P1 J4 from /app/cryosparc_datadir/CS-paphy-light-state-refinement/imports/J1_extract_micrographs_multi

When trying to import the .mrc files of the particle stack from the Restack particles job instead, it also fails:

Traceback (most recent call last):
File “cryosparc master/cryosparc_ compute/run.py”. line 93, in cryosparc compute .run.main
File “/app/cryosparc master/cryosparc compute/jobs/imports/run.py”, line 47, in run_import particles
far_ import path = os.path.expandvars(params[‘particle meta path’])
File “/app/cryosparc master/deps/anaconda/envs/cryosparc master env/lib/python3.7/posixpath.py”, line 288. in expandvars
path = o5.fspath(path)
TypeError: expected str, bytes or os.Pathlike object. not NoneType

I am clearly lost. What am I doing wrong here?

1 Like

The problem is that when exporting a job, all the prior “dependencies” in the workflow are exported too.

The solution is to first get rid of these by inputting your extract micrographs job (for example) into a particle sets job (setting the batchsize to an arbitrarily large value), and removing the location slot from the expanded inputs (which will remove the micrographs from the outputs).

Now when you import the job using your first described method, it should just copy across the particle stacks, not the mics.

Hope that helps!

Cheers
Oli

3 Likes

This is exactly what I was looking for, thanks a lot!