No heartbeat error, but refinement continues

Hi @dnamkr,

The heartbeat error happens when a running compute job hasn’t responded for 30 seconds. This usually indicates that the job has crashed in a serious way (so it couldn’t even report an error traceback). In your case if all your hardware resources are consumed by a different process (Relion for instance) then the cryoSPARC job definitely won’t be able to do anything for 30 seconds, in which case it looks to the cryoSPARC scheduler as if the job has completely failed.
So it’s not really that anything wrong is happening - this is the expected behaviour if the system is fully loaded by another process.

Ali

1 Like