CryoSPARC v4.2.1. crashes in different jobs

OleUns · May 30, 2023, 7:30am

Hi,

Since we updated cryoSPARC to version v4.2.1 two of our workstations (Intel Core i7, 64 GB RAM, 2x NVidia RTX2080Ti and Intel Core i9, 128 GB RAM, 2x NVidia Quadro RTX 5000, both run Ubuntu 22.04 LTS) occasionally crash running different job types (2D classification, 3D classification, ab-initio reconstruction and all types of refinements). After crashing, the web interface of cryosSPARC is not available anymore and the cryosSPARC master can only be restarted after the temporary file tmp/cryosparc-superviosor-long_number.sock is deleted.

CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-d8b6747a381ef263346118f825d16ff7.sock refused connection

Updating cryoSPARC to v4.2.1+230427 did not solve the problem. I also tried to downgrade one workstation to v4.1.2. which did not resolve the crashes. Next, I reinstalled both the cryoSPARC master and worker, again not resolving the issues.

So far we could not identify a clear pattern but it appears that cryoSPARC is more likely to crash when two GPUs are used and bigger particle sets are processed. Otherwise, the issue is independent of the box size used (220 or 300 makes no difference) and also happens when 2D classification with 800000 particles, box size 220 and 100 classes is performed. Usually cryoSPARC does not crash right away but rather in the middle of processing.

I would be happy if someone could help me/us to figure out the problem and find a solution.

Best,
Ole

wtempel · May 30, 2023, 8:16pm

Welcome to the forum @OleUns

This file should be manually deleted only under exceptional circumstances and after ensuring that no processes belonging to the CryoSPARC instance are still running:

ps -eouser,pid,ppid,cmd | grep -e cryosparc -e mongo

Otherwise, deletion of the file may leave the CryoSPARC instance in an undefined state.

Are other applications running on those computers affected when CryoSPARC crashes?
After you recovered from such a crash, you may want to

compare the combined RAM usages (shown in the Event Log) of jobs that were running at the time of the crash to available system RAM
inspect the jobs’ Event and Job (Metadata|Log) logs for errors
inspect the log files inside /path/to/cryosparc_master/run/ for errors