2D Classification - ValueError: Detected NaN values

Hi!
I have seen a similar error being discussed in the forum, but it is for other types of jobs.
When running 2D classification in v4.5.3 from recently extracted particles, the “NaN values” error is returned below. Re-extracting didn´t help, and the source micrographs have been used earlier just fine.
Is there a way to perhaps identify the exact problematic particle and remove it from the stack?
By the way, I have tried turning many parameters on and off, but it persists at random iterations.
I really appreciate any help you can provide.
Andre.


(base) alba@layne:~$ cryosparcm eventlog P22 J91 | tail -n 30
[Sun, 03 Nov 2024 17:59:29 GMT] [CPU RAM used: 4238 MB] Start of Iteration 16
[Sun, 03 Nov 2024 17:59:29 GMT] [CPU RAM used: 4238 MB] – DEV 0 THR 0 NUM 2500 TOTAL 111.17213 ELAPSED 221.75251 –
[Sun, 03 Nov 2024 18:03:13 GMT] [CPU RAM used: 4262 MB] Finished engine iteration 16 in 224.425s
[Sun, 03 Nov 2024 18:03:13 GMT] [CPU RAM used: 4263 MB] – Effective number of classes per image: min 1.00 | 25-pct 22.41 | median 47.24 | 75-pct 66.61 | max 96.65
[Sun, 03 Nov 2024 18:03:13 GMT] [CPU RAM used: 4263 MB] – Probability of best class per image: min 0.01 | 25-pct 0.03 | median 0.06 | 75-pct 0.12 | max 1.00
[Sun, 03 Nov 2024 18:03:13 GMT] [CPU RAM used: 4263 MB] Solving 2D densities…
[Sun, 03 Nov 2024 18:03:13 GMT] [CPU RAM used: 4263 MB] Solved class 100/100 in 0.03s
[Sun, 03 Nov 2024 18:03:15 GMT] 2D classes for iteration 16
[Sun, 03 Nov 2024 18:03:15 GMT] Noise Model for iteration 16
[Sun, 03 Nov 2024 18:03:15 GMT] Effective number of assigned classes for iteration 16
[Sun, 03 Nov 2024 18:03:16 GMT] Probability of best class for iteration 16
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4317 MB] Done Full Iteration 16 took 226.579s for 10000 images
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Outputting results…
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Output particles to J91/J91_016_particles.cs
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Output class averages to J91/J91_016_class_averages.cs, J91/J91_016_class_averages.mrc
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Clearing previous iteration…
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Deleting last_output_file_path_abs: /media/alba/Seagate_Expansion_Drive/Process/J91/J91_015_particles.cs
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Deleting last_output_file_path_abs: /media/alba/Seagate_Expansion_Drive/Process/J91/J91_015_class_averages.cs
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Deleting last_output_file_path_abs: /media/alba/Seagate_Expansion_Drive/Process/J91/J91_015_class_averages.mrc
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4353 MB] Removed output results for P22 J91
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4357 MB] Start of Iteration 17
[Sun, 03 Nov 2024 18:03:16 GMT] [CPU RAM used: 4357 MB] – DEV 0 THR 1 NUM 2500 TOTAL 108.13949 ELAPSED 108.52214 –
[Sun, 03 Nov 2024 18:05:07 GMT] [CPU RAM used: 4362 MB] Traceback (most recent call last):
File “/home/alba/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 2294, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 134, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 135, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
File “cryosparc_master/cryosparc_compute/jobs/class2D/newrun.py”, line 640, in cryosparc_master.cryosparc_compute.jobs.class2D.newrun.class2D_engine_run.work
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 1518, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.compute_error
ValueError: Detected NaN values in newengine.compute_error. 396900 NaNs in total, 1 particles with NaNs.

Some particle MRC files might be corrupt, either in the particle cache (if Cache particle images on SSD was enabled for the job) or inside the project directory.
You may want to run a Check for Corrupt Particles job with Check for NaN values enabled, and connect the particles output by Check for Corrupt Particles to a 2D classification job. Does the Check for Corrupt Particles detect any corrupt particles? Note that the detection of corrupt particles would lead to a warning in the Check for Corrupt Particles job’s event log, not outright failure of that job.

Many thanks for this, @wtempel, and forgive my inattentive blindness, which prevented me from noticing the obvious job called Check for Corrupt Particles. This task identified three corrupted data files that contained actually valid micrographs with good particles but had zero picks. Now, I can proceed with the processing pipeline. Best wishes.