Crash of reference-based motion correction

I am trying to run reference-based motion job but i got error:
====== Job process terminated abnormally.
The job details are as follows:

Input: non-uniform refinement particles (400 px )/ volume and Patch motion output micrography.

Raw movies are EER files collected from Faclon4i detector.

My computing settings are (the default setting didn’t work at all so changed to this setting):
Number of GPUs: 4
gpu_oversubscription_gb: 2
In-mem_cache_sz: 20

The cluster I am using is:
64 core
8 x RTX2080Ti GPU
rtx2080ti/gpu_cc=7.5

I got some trajectories, it then crashed when processing the movies to 26% and terminated abnormally.

Here are the end terms of the log file:
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
DIE: particle pairwise distance matrix is singular, check for duplicate particles (sgetrf failed)
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
TIFFReadDirectory: Unknown field with tag 65002 (0xfdea) encountered
========= main process now complete at 2023-12-18 17:34:36.077460.
========= monitor process now complete at 2023-12-18 17:34:38.117264.

Any suggestions?
Many thanks!

The error line:

DIE: particle pairwise distance matrix is singular, check for duplicate particles (sgetrf failed)

Indicates that two (or more) particles are too close together. Check with remove duplicates. Too many particles on a micrograph will also cause a crash in RBMC, so you can split your input stack, use parameters calculated on the whole set, and do RBMC in two runs, then recombine.

Set GPU oversubscription to a number larger than GPU VRAM, so that GPUs are not over-subscribed. Depending on system RAM, increase RAM cache. Something else to note is that if you use EER upsampling (the micrographs are 8K with particles Fourier cropped to 4K) RBMC will automatically output 8K scaled “shiny” particles.

1 Like

@rbs_sci is correct - the “singular pairwise distance matrix” message means that you have some particles that are too close together. A surprising number of users have encountered this, and it’s on our radar as something to fix in the future.

Thanks for the reply. Glad to hear that you are working on this issue. I tried both splitting the input stack and removing the duplicates but non of them worked. It does not make sense for me to re-pick the particles with lower min_seperation distance because to do so, I would need to redo everything again. I hope you could fix the problem with the next update.