Since we updated cryoSPARC to version v4.2.1 two of our workstations (Intel Core i7, 64 GB RAM, 2x NVidia RTX2080Ti and Intel Core i9, 128 GB RAM, 2x NVidia Quadro RTX 5000, both run Ubuntu 22.04 LTS) occasionally crash running different job types (2D classification, 3D classification, ab-initio reconstruction and all types of refinements). After crashing, the web interface of cryosSPARC is not available anymore and the cryosSPARC master can only be restarted after the temporary file tmp/cryosparc-superviosor-long_number.sock is deleted.
CryoSPARC is running.
unix:///tmp/cryosparc-supervisor-d8b6747a381ef263346118f825d16ff7.sock refused connection
Updating cryoSPARC to v4.2.1+230427 did not solve the problem. I also tried to downgrade one workstation to v4.1.2. which did not resolve the crashes. Next, I reinstalled both the cryoSPARC master and worker, again not resolving the issues.
So far we could not identify a clear pattern but it appears that cryoSPARC is more likely to crash when two GPUs are used and bigger particle sets are processed. Otherwise, the issue is independent of the box size used (220 or 300 makes no difference) and also happens when 2D classification with 800000 particles, box size 220 and 100 classes is performed. Usually cryoSPARC does not crash right away but rather in the middle of processing.
I would be happy if someone could help me/us to figure out the problem and find a solution.