Make the ResLog job optionally multi-GPU?

Guillaume · October 11, 2024, 6:34pm

Hello,

I am running a ResLog job with a massive dataset of ~2.4 M particles, and it is probably going to time out (our slurm cluster has a max time of 72 hours for the partition that CryoSPARC submits its jobs to).

If I understand this job correctly, it could benefit from being multi-GPU since the reconstructions with different numbers of particles are done separately, right? If not, never mind.

And for now I will try to run it with less steps or with a random subset of the particles (if this first attempt times out).

Thanks!

wtempel · October 16, 2024, 4:05pm

@Guillaume May we ask:

What are the box and pixel sizes of the particles?
How long does a Homogeneous Reconstruction Only job take for the same ≈2.4 M particles?

Guillaume · October 16, 2024, 4:38pm

Yes I forgot to specify this, sorry. This is a huge box: 648 pixels (pixel size ~0.5 A/px).

I did not try to run ResLog with a smaller reconstruction box size because I wasn’t sure how it would affect the resolution estimation.

“Homogeneous reconstruction only” of this entire set took a bit more than 20 hours. Homogeneous refinement took a bit more than 25 hours (I expected a bigger difference in run time between these two jobs).
I can’t verify whether these two jobs and the ResLog job ran on the same node, but they got to run on either a 3080 Ti or an A5000.

I realized that I ran my first attempt at the ResLog job with settings that caused many reconstructions: 1000 particles in the initial sample size, multiplier 1.5, error bars on.
I ran it again with multiplier 4 and no error bars, and it completed in a bit more than 55 hours. So in this job I have 7 points (only 6 being useful for the fit, since the 7th is very close to the 6th), and this is probably still more than enough to constrain the linear fit.

So, new suggestion: it could be simpler to have a parameter asking how many data points we want (with a reasonable default value, maybe 4 or 5 points?), and subsequently automatically calculate the multiplier and initial sample size that result in this many points equally spaced on the log scale and starting from the total number of particles for the highest point.

Guillaume · October 16, 2024, 4:47pm

Another thing, it would be nice if the ResLog job reported the slope and intercept of the fit line. As it is now, one needs to download the text file with the resolution and sample size values and fit the line to get the slope and intercept. It’s not very difficult to do, but an additional step when getting these numbers is the primary reason to run a ResLog job.

kelder · October 21, 2024, 4:40pm

Hi @Guillaume,

Thank you for the suggestions, these feature requests have been recorded!

Katherine

Guillaume · December 8, 2024, 10:16am

The performance issues I reported here seem no longer relevant.

This ResLog job described in my last post above took 55 hours 30 min when we had CryoSPARC 4.5.
Now, after upgrading to CryoSPARC 4.6 and applying some performance optimizations on our cluster (all related to I/O), the same job (cloned) took 2 hours 49 min. This is a massive improvement.

In both cases I subtracted the time it took to copy the particles to the cache. The comparison would not be fair otherwise, since for the first job the particles were already in cache and it took only 40 s to check it, while for the second they needed to be copied over again.

@daniel.s.d.larsson has more examples of the improvements we got on our cluster after these optimizations.

That said, the suggestion to report the slope and intercept of the linear fit in the job log still stands.