Hello,
I’ve seen this ‘no heartbeat’ error a few times now with no other error preceding it in the events log.
I checked some other posts and went to the job.log file, and this was the last line there after a long series of ‘sending heartbeat’ lines. I haven’t checked the log file of the other previously failed jobs. This is from the most recent blobpicker job.
/projappl/project_#######/usrappl/username/cryoSPARC/cryosparc_worker/bin/cryosparcw: line 150: 2404207 Segmentation fault python -c "import cryosparc_compute.run as run; run.run()" "$@" b
I would’ve pasted the whole log file, but I restarted the job instead of cloning it to try again and only copied the last line to look it up, so the log file also got blank for now. There was just a few ‘depreciation warning’ for variable names and a lot of ‘sending heartbeat’ lines before this.
Does this line say anything about what the problem could be? I’ve increased the heartbeat to 360 seconds and restarted the job to see if that would help, but I feel like the problem could be elsewhere. I’d requested 2x the necessary RAM, and the time limit was 1 day; it ran for around 2.5 hours or so.
I’ll add the output of the restarted job if it fails or succeeds.