Terminating no "heartbeat" jobs

As discussed previously, csparc can raise a false alarm (i.e. no heartbeat error), while the job continues as if nothing happened. Unfortunately, the error leads to users not being able to kill such jobs from the GUI (the kill button is greyed out). Killing a job like that requires root access (at least on our systems), which generally a user would not have.

I wonder if it would be possible to leave the kill function accessible for jobs that are in fact active.

Peter

The reason this is difficult is that there isn’t a way right now for the webapp to know when a job process actually dies, which is why we use the heartbeat, but if the job is stalled for more than 120 seconds (in the latest cryoSPARC) the webapp can’t tell the difference between this and the job having terminated for some reason. If we were to keep the kill button active after that, clicking it would end up killing whatever process is running with the PID of the original job (but PIDs can be reused by the system if the job did actually terminate) so then we would be terminating a potentially unknown arbitrary process.

The next major cryoSPARC version will address the problem in a different way.

By the way, what was your reasoning for installing/running cryoSPARC as root?

Ali

thanks for the explanation, Ali

BTW, is there a correct way of removing a job that has not started yet from the queue? It seems every time I do that, the job would run anyway, in the original order, but I cannot see results via browser.

The correct way would be to open the experiment that is queued, go to the Launch page, and click on Clear. That should remove it from the queue. I believe there was a fix in a recent version that does not allow users to delete experiments by clicking the red x on the experiments page if the experiment is queued, but if you’re still able to get orphaned queued jobs, maybe it’s not properly fixed.

Thanks for clarifying it! Unfortunately, the bug is still there. “Clear” removes the job from the queue page. However , cryosparc would still run it in the original order. While that job is running, the total number of currently running jobs is displayed correctly on the top of the cryosparc page, but the ghost is missing from the queue page, so there is no way to view the results of that job…

(version 0.6.2)

Peter