GPU acceleration for local cross validation?


Loving the speedup in FSC calculation in 3.3. The real bottleneck for us though seems to be the local cross validation step in NU-refine and local refinement with NU-regularization - any chance of similar enhancements there…? It becomes painfully slow at large box sizes (600px+), so any speedup would be welcome!

I guess now at least one way to speed it up for local refinement would be to use Volume Alignment Utilities to recenter and re-extract sub-particles in a smaller box…



Hey @olibclarke,

Thanks for the request. The local CV procedure is GPU accelerated, but I’ve added it to our tracker as there could be some optimizations done to improve speed with larger box sizes.



In local refinement this acutally seems to not utilize the GPU really though?

[CPU: 68.77 GB]    Processed 301680.000 images in 692.051s.

[CPU: 74.20 GB]    Computing FSCs... 

[CPU: 74.20 GB]    Using full box size 882, downsampled box size 450, with low memory mode disabled.

[CPU: 74.20 GB]    Computing FFTs on GPU.

[CPU: 76.76 GB]      Done in 9.230s

[CPU: 76.76 GB]    Using Filter Radius 99.518 (8.189A) | Previous: 67.914 (12.000A)

[CPU: 97.23 GB]    Non-uniform regularization with compute option: GPU

[CPU: 97.23 GB]    Running local cross validation for A ...

watch nvidia-smi --id=0
Thu Mar 10 16:01:27 2022
| NVIDIA-SMI 510.54       Driver Version: 510.54       CUDA Version: 11.6     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA A40          On   | 00000000:01:00.0 Off |                    0 |
|  0%   56C    P0    86W / 300W |   8157MiB / 46068MiB |      0%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A      3244      C   python                           8151MiB |


Any update on this? Currently using v3.3.1.

Currently running NU-refinement and local refinement (with NU) with 440pix box size and its taking a lot of time: ie. Local refinement with 288k symmetry-expanded particles takes around 12 hours.

But the worst is that I can only run a single refinement at the time, otherwise they get stuck in the cross validation step and none progresses.

In earlier versions I could run multiple local jobs with 512pix box size without issues.

Thanks in advance

Hi @LTP,

Thanks for reporting this, I’ve logged the issue. To help us dig into this further:

  • When running two jobs, do they both get stuck during the cross-validation step only with local refinement jobs, or in standard non-uniform refinement jobs too?


Hi @mmclean

I get the impression that is mostly during local refinement, but this is because I don’t run many standard NU refinements in parallel.

To give an idea: two very similar local refinement jobs (same mask, just one had 3000 particles less):

Run alone with 287424 symmetry-expanded ptcls (440pix box size): 17 iterations took ~15 hours

Run in parallel with another local refinement, 279990 symmetry-expanded ptcls (440pix box size): 12 iterations took ~23 hours

For reference, NU refinement with 143712 (non expanded) ptcls, 440pix box size run alone in ~4 hours.

On average, the minimum amount of time I get for local cross validation steps is around 10 minutes each (A and B), so total 20 min per validation step.

Hope this helps.