Bogus NaN particle warnings in refinement

Hi, I’m seeing what appear to be bogus corrupt particle warnings during NU-refine with 4.5.1, e.g.:

Traceback (most recent call last):
  File "/home/exx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2294, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 134, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 135, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1136, in cryosparc_master.cryosparc_compute.engine.engine.process.work
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 421, in cryosparc_master.cryosparc_compute.engine.engine.EngineThread.compute_error
ValueError: Detected NaN values in engine.compute_error. 33976503 NaNs in total, 209 particles with NaNs.

When I use check corrupt particles on the input stack, no NaN containing particles are detected.

This only seems to happen on large particles ~>900px, and the same particle stack when downsampled to a smaller boxsize does not give any issues. When I try to run homogeneous reconstruction on the same stack, it doesn’t give any warnings, but the FSC & map slices look weird:

Cheers
Oli

Hey @olibclarke –

This looks similar to another bug we’re trying to track down related to per-particle minimization. Do these failing jobs have pp scale minimization on? If so, could you try to re-run them with it off and see if they complete without issue?

Thanks!
Valentin

1 Like

I think they might have been - will check and get back to you!

It did have scale minimization on, but repeating with it off (& scales reset to 1.0) produced the same error.

Ok good to know, thanks!

@olibclarke – a couple more questions/requests when you get the chance! Could you please try:

  1. Randomly subsampling the set of 900px particles to a (much) smaller set (maybe ~5K particles) and re-running the homogeneous reconstruction on this set.

  2. Using a fixed mask (not from the same upstream NU-Refine) within homogeneous reconstruction.

Let us know if these two changes produce different results.

Thanks again,
V

Hi @vperetroukhin, just to clarify, this initial error was obtained from NU-refine, not homogeneous reconstruction - do these suggestions still pertain to this case? I will try a smaller subset for homogeneous reconstruction and see if the empty map slices go away

Understood, yep. Based on the reconstruction plots you shared originally, we’re now suspicious that perhaps the half-maps are fine but mask generation is somehow to blame (since the no mask/spherical FSCs are not NaNs). Getting at this via homo reconstruct is just faster, which is why I asked for that first.

1 Like

Gotcha - will test, thanks!

I think you might be right that mask generation is an issue - the FSC mask generated during homogeneous reconstruction visually looks ok, but looking at the header it has some quite negative values:

Very strange! Can you send us this mask mrc file? Sending you a DM shortly.

1 Like

Has this been resolved? I have similar issue with v.4.6.0 when doing local refinement. My box size is much smaller (384px) but I enabled minimize over particle scales.

Traceback (most recent call last):
  File "/home/klin_csparc/Cryosparc3/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2304, in run_with_except_hook
    run_old(*args, **kw)
  File "/home/klin_csparc/Cryosparc3/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2730, in cryosparc_master.cryosparc_compute.engine.newengine.process.work
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2809, in cryosparc_master.cryosparc_compute.engine.newengine.process.work
  File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 1534, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.compute_error
ValueError: Detected NaN values in newengine.compute_error. 3314 NaNs in total, 1 particles with NaNs.

Not solved yet, no.

This is the problem. Turn it off and it should be OK.

Hello,

I am encountering the same issue with Heterogeneous refinement jobs in Cryosparc v4.6.2. My particles have box size 400 px downsampled to 128 px. The Check for corrupt particles job detected no corrupt particles. I successfully performed an Ab initio job with this particle set, but when I want to proceed with Heterogeneous refinement I get the NaN particles detected error. There is no option to enable/disable the per particle scale in Hetero refinement as advised in some posts related to this issue.

I would appreciate if someone could help me.

Many thanks in advance!

A.

Traceback (most recent call last): File "/home/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2304, in run_with_except_hook run_old(*args, **kw) File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 136, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 137, in cryosparc_master.cryosparc_compute.gpu.gpucore.GPUThread.run File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1146, in cryosparc_master.cryosparc_compute.engine.engine.process.work File "cryosparc_master/cryosparc_compute/engine/engine.py", line 431, in cryosparc_master.cryosparc_compute.engine.engine.EngineThread.compute_error ValueError: Detected NaN values in engine.compute_error. 571536000 NaNs in total, 500 particles with NaNs.

Hi @Ammenoel – how many classes were used here and how many particles belonged to each when it crashed? (we have one potential fix in v5 that may help here and it has to do with NaN’s in the noise model when class volumes are nearly empty).

Hello @vperetroukhin,

Thank you for the reaction. I will try to describe my pipeline the best possible.

Starting particle stack: 421 677 ptc, extraction box 400 px downsampled to 128 px.

I did Ab initio (Job 380) with 6 classes. This job ran without any errors.

Then I took all particles from Job 380 and ran Heterogeneous refinement (Job 381) with 7 initial volumes taken from Job 380 and one volume from a previous Ab initio job. Two of the used volumes were good and five were junk. I set batchsize 3000 and refinement box size 64 vox. However, at the step where 21 000 particles were used, I got the error saying that 500 particles have NaN. When I reran this job later as Job 389 (I overwrote the original Job 381 with Hetero ref. for class 1 only, see below), it crashed with the same error already during iteration 0 before the numbers of particles assigned per class were available.

Hoping that those NaN particles might be only in some classes of the Job 380 Ab initio job, I did the same Heterogeneous refinement (same input volumes, box size and batchsize 3000) for each class of Job 380 separately:

Job 381 – particles from Job 380 class 1, total 58 183 ptc; ran successfully, final assignments per class: 0 – 8 533 ptc, 1 – 6 641 ptc, 2 – 5 895 ptc, 3 – 10 184 ptc, 4 – 6 082 ptc, 5 – 6 350 ptc, 6 – 14 490 ptc.

Job 384 – particles from Job 380 class 2, total 58 104 ptc; ran successfully, final assignments per class: 0 - 8 969 ptc, 1 – 6 854 ptc, 2 – 5 488 ptc, 3 – 6 151 ptc, 4 – 10 363 ptc, 5 – 6 771 ptc, 6 – 13 781 ptc.

Job 385 – particles from Job 380 class 3, total 114 715 ptc; crashed during iteration 4 with error saying that 92 particles are with NaN; total used particles 10 184, assignments per class: 0 – 1 987 ptc, 1 – 369 ptc, 2 – 1 030 ptc, 3 – 4 327 ptc, 4 – 979 ptc, 5 – 815 ptc, 6 – 677 ptc.

Job 386 – particles from Job 380 class 4, total 63 265 ptc; crashed during iteration 1 with error saying that 500 particles are with NaN; total used particles 21 000, assignments per class: 0 – 4 445 ptc, 1 – 1 444 ptc, 2 – 2 200 ptc, 3 – 2 246 ptc, 4 – 2 257 ptc, 5 – 5 700 ptc, 6 – 2 708 ptc.

Job 387 – particles from Job 380 class 5, total 75 560 ptc; crashed during iteration 10 with error saying that 500 particles are with NaN; total used particles 21 000, assignments per class: 0 – 4 277 ptc, 1 – 1 032 ptc, 2 – 916 ptc, 3 – 1 034 ptc, 4 – 934 ptc, 5 – 995 ptc, 6 – 11 812 ptc.

Job 388 – particles from Job 380 class 0, total 51 850 ptc; ran successfully, final assignments per class: 0 – 6 222 ptc, 1 – 5 434 ptc, 2 – 9 503 ptc, 3 – 5 551 ptc, 4 – 5 210 ptc, 5 – 5 783 ptc, 6 – 14 147 ptc.

Just out of curiosity, I extracted particles assigned to the good volumes from the Hetero refinements that ran successfully (Job 381, 384, 388) and from the good Ab initio class (Job 380 class 3), box size 400 px with no Fourier crop. This way I got 156 120 ptcs and ran Non-uniform refinement with them successfully. Hoping that now the problem is solved, I wanted to further purify these particles by Hetero refinement (Job 395, the same settings as stated above). However, this job crashed again during iteration 4 with error saying 500 particles are with NaN. Total used particles 21 000, assignments per class: 0 – 8 688 ptc, 1 – 2 722 ptc, 2 – 1 381 ptc, 3 – 1 987 ptc, 4 – 1 959 ptc, 5 – 1 918 ptc, 6 – 2 345 ptc.

Please let me know should you need more information.

Cheers,

A.

@Ammenoel – thank you for the info! Nothing jumps out immediately at me unfortunately. I think it would be useful to have a job log from one of the failed hetero refinement jobs. I’ll DM you with additional details.