NU-refine fails with anisotropic magnification correction

kumpula · April 8, 2022, 6:17am

Dear all,

I’m running a consensus NU refinement job on a large dataset of 960k particles, 560 px box at 0.86 Å/px and 0.85 Å/px. I have Cryosparc v3.3.1+220315, running on a cluster with NVidia V100 (32GB) GPUs. I have collected two datasets of identical grids on two very similar Krios setups with slight changes in nominal pixel sizes (<2% difference). I forced these to the same pixel size for processing, and now refining them together with anisotropic magnification correction ON to compensate for the difference in pixel size. Even without this, we’re looking at sub-3Å resolution by FSC.

When running the final NU-refinement the run fails around iteration 11 with the following error message:

[CPU: 62.01 GB]  Traceback (most recent call last):
  File "/projappl/project_2004941/usrappl/kumpula/cryoSPARC/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1837, in run_with_except_hook
    run_old(*args, **kw)
  File "/projappl/project_2004941/usrappl/kumpula/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "cryosparc_worker/cryosparc_compute/engine/newengine.py", line 2443, in cryosparc_compute.engine.newengine.process.work
  File "cryosparc_worker/cryosparc_compute/engine/newengine.py", line 2467, in cryosparc_compute.engine.newengine.process.work
  File "cryosparc_worker/cryosparc_compute/engine/newengine.py", line 543, in cryosparc_compute.engine.newengine.EngineThread.preprocess_image_data
  File "cryosparc_worker/cryosparc_compute/engine/newgfourier.py", line 121, in cryosparc_compute.engine.newgfourier.rfft2_on_gpu_inplace
  File "/projappl/project_2004941/usrappl/kumpula/cryoSPARC/cryosparc_worker/cryosparc_compute/skcuda_internal/cufft.py", line 353, in cufftExecR2C
    cufftCheckStatus(status)
  File "/projappl/project_2004941/usrappl/kumpula/cryoSPARC/cryosparc_worker/cryosparc_compute/skcuda_internal/cufft.py", line 124, in cufftCheckStatus
    raise e
cryosparc_compute.skcuda_internal.cufft.cufftInvalidPlan

On rerun with the same parameters, I get the same error earlier, so it seems somewhat sporadic.

I thought that even though it is not directly stated, this could be a GPU memory issue, so tried to force the GPU batch size to 300, but this caused the run to fail earlier with the same error message. When I run without magnification correction, job runs without issues.

All help appreciated!

wtempel · May 25, 2022, 6:43pm

Unless this job ran on a kernel version 3 system (like centOS-7), cryosparc_compute.skcuda_internal.cufft.cufftInvalidPlan suggests a problem with GPU memory capacity.
To work around the problem, you may try to further reduce the batch size to 50 or even 20. This smaller batch size would be expected to slow down processing as a side effect.
Please keep in mind that the anisotropic magnification correction is estimated at the exposure group level, not separately for each micrograph or particle. The data would therefore need to be divided into an appropriate number of exposure groups.