Resume a cryosparc job

diffracteD · November 28, 2020, 8:35am

Hi,
I’m very new to CS.
While I was using 2D classification I found out that if a job crashes due to any reason like out of memory or something else, there is no option to continue the job from the GUI.
My query is if there is any option or workaround to resume a failed/crashed job in CS.

Thanks

rj.edwards · November 28, 2020, 9:40pm

You can Clear Job, and then re-start it (from the beginning). No way to resume a job (that I know of).

Best,
RJ

diffracteD · November 29, 2020, 4:56am

Yes that’s an option we have. But sometimes some job crashes due to out of memory error in last iterations.
In such cases an option which might allow to resume the job assigning more memory on clusters would be very helpful. Sad that we don’t have any options like that.

spunjani · November 30, 2020, 5:17pm

Hi @diffracteD,
Unfortunately we do not have a way to resume 2D Classification or Refinement jobs that failed in cryoSPARC, at the moment. However, this is possible in cryoSPARC Live (https://cryosparc.com/docs/live). We hope to add similar functionality into cryoSPARC soon.

If you are okay to start from raw movies, you can import these into cryoSPARC Live, have the cryoSPARC Live GPU workers take care of preprocessing (motion correction, ctf estimation, picking, extraction), and then use the Streaming 2D Classification job in Live, which if it fails, will be able to continue from a last saved state.

kpahil · March 3, 2022, 11:47pm

Hi @spunjani ,

Any word on getting this type of functionality in cryosparc proper? I’ve had some long jobs (ex. non-uniform refinement with very large particles) crash 20+hrs in due to cluster instability a few times and have just restarted from scratch; it’d be useful to be able to restart from the most recent iteration’s output at the very least.

Thank you!

ZTBioPhysics · May 12, 2025, 7:55pm

Hello @spunjani

Is there any progress on implementing this function? Im working with supersampled EER data and trying to run RBM correction but it is difficult to get the time I need on my cluster for the job to complete (up to 6 days on this data). It would be really great to have this functionality implemented (ie resuming jobs from where they failed).

CryoEM2 · May 12, 2025, 7:59pm

agreed. in your specific case you can chop the list of micrographs into 10 with exposure sets, then queue up each after the other using the same hyperparameters and the total processing time should be the same but with checkpoints and cluster sharing.

ZTBioPhysics · May 12, 2025, 9:53pm

I ran the hyperparameter search in one job then the doseweighting in another to reduce the time but motion correction is still taking ridiculously long. What you suggest seems like a good idea! I had already split the particle set in half and refined each half independently, but each half is still ~500k particles.