I recently started using Cryosparc on a new machine, and it has repeatedly crashed the system. The system becomes completely unresponsive and cannot be accessed via SSH or webgui.
It seems to happen during 2D class jobs (usually with large numbers of classes and particles)… on a previous machine I had 2D class jobs fail if the number of classes was too much for the GPUs memory but it never crashed the entire system.
Has anyone else experienced hard crashes, is there any way to make sure that jobs wont crash the system (even if they fail instead>?).
Yes, I am having that same issue: new linux box and hard crash of the entire machine when I was doing blob picking (so not using either the CPU or the GPU at any appreciable level).
I managed to get through the motion correction and CTF estimation steps without issue.
Unfortunately, I haven’t had any response so far and I have yet to figure out a solution.
Best regards, Tom
If you notice any patterns, let me know. I haven’t detected any consistent cause yet, and there isnt a single job that I cant do.
My best guess so far is that its a glitch whereby the job should fail due to some resource limit (or perhaps someone else SSHing into the box and using some RAM etc… but there isn’t an adequate safeguard to prevent it confusing the kernel.
I have a few Cuda synchronisation type errors too, so it might not even be the same problem.
As my new box is at home, I’m pretty sure nobody else is trying to use it at the same time.
I found it weird/ strange as the CPU and GPU were barely being used during the blob picking process when my jobs consistently failed- but they never failed on the same image/movie, so I’m pretty sure it wasn’t a ‘bad data’ issue.
Still trying to figure out any pattern that might be present…
Best regards, Tom