Error while doing the 2D classification and stalled the job

When I am doing the 2D classification, after running some time the jobs are getting stalled with errors “detected NaN values in new engine.compute_error. 595350000 NaNs in total, 500 particles with NaNs” (attached screenshots).


What could be the issue and how to fix this. I appreciate your help.

you can run job “check for corrupt particles”.

Thank you. I am doing it.

That many NaNs would appear to indicate some other underlying issue. I’d run Memtest for 24 hours (or until it fails) as well.

I am running the jobs on HPC. So, do you still suggest me to do Memtest?

Then I’d contact the system administrator and ask them to check, yes.

solved:
I ran the ‘Check for corrupt particles’ job with ‘Check for NaN values enabled’ and then used the resulting particles for 2D classification. The classification ran successfully.

Excellent. But nearly 60 million NaNs bears investigating. Did it only remove 500 particles?

I think that is the error number. It removed around 10,000 particles