Restart failed jobs with cryosparc-tools

DanielAsarnow · March 16, 2023, 5:52pm

Does anyone know if there’s a way to restart all jobs that failed in, say, the last 6 hours using cryosparc-tools?

I would like to update cryoSPARC and then conveniently requeue the failed jobs instead of waiting for them to all finish using maintenance mode.

nfrasser · March 16, 2023, 8:40pm

Hi @DanielAsarnow, here’s a sample Python script using cryosparc-tools which should be able to do this:

from datetime import datetime
from time import time
from cryosparc.tools import CryoSPARC

cs = CryoSPARC(...)
for job_doc in  cs.cli.get_jobs_by_status('failed'):
    if job_doc['deleted'] or not job_doc.get('failed_at'):
        continue

    date_format = '%a, %d %b %Y %H:%M:%S %Z'
    failed_at = datetime.strptime(job_doc['failed_at'], date_format)
    if time() - failed_at.timestamp() > 60 * 60 * 6:  # last updated more than 6 hours ago
        continue

    job = cs.find_job(job_doc['project_uid'], job_doc['uid'])
    job.clear()
    job.queue(
        lane=job_doc['resources_allocated']['lane'],
        hostname=job_doc['resources_allocated']['hostname'],
        gpus=job_doc['resources_allocated']['slots'].get('GPU', []),
    )

Let me know if that works out.

Edit: Correct date parsing and filtering