Job directory is not empty error after strange queueing behaviour

Hello all,

We’ve had a strange incident we hope someone might be able to help with. In one of our projects, Cryosparc appeared to randomly re-queue hundreds of older jobs (as in years old, not touched) that had already been run, completely overwhelming our system. This seemed to occur shortly after a small workspace clean up had been run and a new job queued.

To clear the backlog, all the queued jobs were marked as complete in the database, which restored many of the jobs to normal. However, some of the jobs that had been randomly queued also seemingly tried to launch and entered a failure state. After these were also reset to completed, their output data is no longer visible in the UI. Instead the jobs carry a message such as:

Job directory /mnt/ome/data07/cryosparc/XXX/J3688 is not empty, found: /mnt/ome/data07/cryosparc/XXX/J3688/job.log

The underlying directories still contain the original data and the output tabs still look correct, but it seems the UI is locked in this state and we can’t see the outputs in the event log window

Does anyone have any suggestions for a solution? We already detached and re-attached the project.

Best regards,

Charlie

@charliebe2 On the given CryoSPARC instance, are there any automations or other mechanisms in place that manage CryoSPARC jobs and/or data bypassing the web app?

Were these jobs queued to a node or cluster-type scheduler lane?

There are no automations of other mechanisms in place to manage jobs and no bypassing of the web app. The jobs are queued to a node.