Feature Request: Automatic Pausing for CryoSPARC Live Jobs

Hello Structura Friends,

We’ve been using CryoSPARC Live with our Slurm cluster for a while, and overall it works rather well for our purposes (barring the occasional bug).

We have noticed, however, that the CryoSPARC Live worker has a tendency to busy wait on top of GPUs if our microscope operators aren’t conscientious about pausing their live sessions. Phrased differently, we occasionally find a few long-running live sessions that are allocated one or more GPUs that aren’t being used, and these live sessions block other jobs on our cluster from using otherwise idle GPUs.

Would it be possible to make it so that if no new exposures are discovered within a configurable time interval (e.g., 1 hour), the CryoSPARC master automatically kills the worker jobs? It could then restart them when new images are detected.

This would allow us to achieve much better resource utilization on our cluster, which is increasingly becoming a higher priority for us as server components increase in price due to the AI boom / macroeconomic conditions.

Let me know what you think!

–John

1 Like

Thanks @jpellman for your post. Have you already considered related functionality in CryoSPARC v5, which is currently in beta testing?

Hi @wtempel ,

Sorry for the noise- I was unaware that CryoSPARC v5 already had this functionality implemented / I missed this in the release notes. This looks to be exactly what we want.

–John

2 Likes

3 posts were split to a new topic: CryoSPARC Live default profile

I’m not sure if the new auto-pause + delay worker startup features do this, but it would also be ideal if workers automatically restarted after they’ve been automatically paused (i.e., if they directly responded to the queue of unprocessed images in some way). We do occasionally experience issues where images don’t get transferred over from our camera computers due to network interruptions, and it would be nice if CryoSPARC could sense these new images and requeue workers without manual intervention (since occasionally these transfer issues occur at late hours of the night / early morning hours).

I wouldn’t prioritize what I’m describing over making auto-pause stable (assuming that the behavior I’m specifying isn’t already implemented); auto-pause is already a huge game-changer.