On CryoSPARC version v5.0.1, we are seeing several Homogenous Refinement jobs failing with a numpy.linalg.LinAlgError: Singular matrix.
Traceback (most recent call last):
File "cli/run.py", line 105, in cli.run.run_job
File "cli/run.py", line 210, in cli.run.run_job_function
File "compute/jobs/refine/run.py", line 604, in compute.jobs.refine.run.run_homo_refine
File "compute/jobs/refine/run.py", line 605, in compute.jobs.refine.run.run_homo_refine
File "compute/jobs/ctf_refinement/run.py", line 436, in compute.jobs.ctf_refinement.run.full_ctf_refine
File "/opt/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numpy/linalg/linalg.py", line 409, in solve
r = gufunc(a, b, signature=signature, extobj=extobj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/cryosparc/cryosparc_worker/.pixi/envs/worker/lib/python3.12/site-packages/numpy/linalg/linalg.py", line 112, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix
The jobs are all after importing beam shift information from EPU with a job of Import Beam Shift, and Exposure Group Utilities setup with âcluster&splitâ, Correspond particles to exposures and enforce consistency of exposure group IDs, 57 clusters, kmeans clustering, split outputs by exposure groups.
I believe only the particles from the âExposure Group Utilitiesâ are included in the downstream jobs. And these Homogenous Refinement jobs can occasionally be rerun and successfully complete, so the crash with the python error seems random or not always reproducible.
It looks like there were past issues with a similar crash ânumpy.linalg.LinAlgError: Singular matrixâ but I donât know if those were directly resolved or people changed how they were running the jobs.
Any suggestions here for what we should try to avoid this issue?