Restart failed jobs results in infinite queue versus Clear then Queue (v4.1)

If I right-click and Restart a failed job, the job will appear to be stuck in the queue indefinitely. The Event Log says simply “Job is queued”. The same job, if first Cleared then Queued (to the lane) using the right-click menu will now Start as usual.

We had removed and readded a lane with the same name, could this be related?

Hi @Navid ,

What job type are you referring to here? Did you queue it to a lane or directly to specific GPU(s)?

- Suhail

1 Like

Hi Suhail,

The original job was queued to our HPC cluster lane ‘lilac’. Whether the job (any type) completed or failed, if I right-click → Restart, it sits on “Queued ‘lilac’” forever without any progress i.e. it is not actually submitted to the cluster lane. If I clear the job, then requeue it directly to the lane ‘lilac’, the job will run. See video:

https://youtu.be/KWv1-q1iBlM

Best regards,
Navid

Hi @Navid ,

Thanks for the clarification and screencast! We’ll investigate and keep you updated.

- Suhail

1 Like

This has been fixed in the v4.1.1 update, thank you!

1 Like