Advice for refinement of large boxes (960 pixels)

jimhbean · June 28, 2023, 12:08am

Howdy
I am trying to run a Homogenous refine job using un-binned particles yielded from a previous 4 x binned homogenous refinement job. The binned job completed in 5 minutes using a box size 240 pix. The un-binned box is 960 pix, and appears to be stuck on “computing FFTs using CPU”. It’s been running for almost 6 hours and hasn’t hit iteration 1 yet.

I’m aware I may just need to wait it out, but i was wondering if anyone knew of ways to optimize the refinement to reduce or spread out the computational load? Or if there are any other tricks to speed things up?

A secondary question, is does anyone know if one can use multiple GPU’s in homo-refine? I don’t see a ‘GPU’s to parallelise’ option, and thus am only using 1/7 available.

Thanks all for your time
James.

rbs_sci · June 28, 2023, 2:33am

The 3D refinement options are all restricted to a single GPU, as far as I know.

And no, there is basically no way to speed up big box sizes.

You could try a lower binning (maybe 1.5x?) as 640 pixel boxes should still run fairly quickly.

960 pixel boxes will need a lot of VRAM too. What are you using?

edit: Also check system RAM utilisation; I recently had a case where when running multiple (big box - 840 pixel) non-uniform refinements, they all just got stuck and didn’t go anywhere for days. Turned out that they had swallowed all the system RAM and silently failed (dmesg logs were full of errors regarding lack of RAM).

mmclean · June 28, 2023, 2:48pm

Hi @jimhbean,

Thanks for the report. If the job is still running, could you let us know if the system is swapping (e.g. via the commands htop or vmstat)? This may explain the slowdown – the FSC procedure does many FFTs and if host memory is limited, swapping would severely impact performance.

Generally the best way to speed up a large-volume refinement is to reduce the box size to as small as possible without the expected FSC resolution exceeding the Nyquist resolution. If the 240px box had a resolution hitting nyquist, you could get a rough estimate of the resolution that an unbinned refinement would reach by try running a “Homogeneous Reconstruction Only” job using the particles from the 240px refinement, but overriding the particles.blob low-level slot with the un-binned particles. This will reconstruct using the fitted alignments, but unbinned particles, and the final FSC from this would give you a decent idea of the resolution you expect. The reconstruction only job uses significantly less host memory than a full refinement, so (hopefully) it may complete without swapping…

After that, I’d recommend binning particles to have a nyquist equal to the FSC resolution, and re-running refinement.

Hope this helps,
Michael

jimhbean · June 30, 2023, 9:25am

Howdy

I have run the jobs at lower binning factors but we really are shooting for an un-binned resolution. The card i have been running is a 2080 TI which has 12 GB VRAM.

I tried reducing the box size to 900 pix and reducing the GPU batch size to minimum to try and reduce the load on the system but it kept crashing regardless of what i did. I think it may have actually been system RAM (think this system doesn’t have a huge amount). According to the cryosparc box size limitation page a 12 GB card like the 2080 should be able to handle the box size

We are going to migrate the big box jobs to a different system which has some extra beans, so far as i can tell this system just can’t hack it! Thanks for your time and advice though
James.

jimhbean · June 30, 2023, 9:33am

hi @mmclean
Sorry the job is not running, but it quickly became apparent that slowness was the least of our problems! instead the job crashed repeatedly as it was running out of RAM.

Oh cool! that’s an interesting method. I’m curious, if you are running the 240px box size reconstruction only job but overriding with un-binned particles, does this still produce a 240 px box reconstruction while reporting an un-binned FSC? Am i understanding this correctly? Neat trick if so!

As i mentioned above, looking like we will migrate the job to a beefier system. Thanks so much for your help though
Cheers,
James.

rbs_sci · June 30, 2023, 10:07am

2080Ti is 11GB, which in RELION is enough for 1024 pixel boxes (just - if anything else is hitting the GPU it’ll fail) but CryoSPARC seems to need a little extra breathing room.

I’ve successfully run up to 1000 pixel boxes on 11GB 1080Tis with 128GB of system RAM, but on a box with 64GB big boxes would always crash.

Definitely seems like migrating to a system with more RAM/VRAM would solve you problem.