Motion Correction Benchmarking: Duration and GPU Load

Hi, I ran a motion correction on approximately 8,000 micrographs (around 4.3 TB in size), with a raw pixel size of 0.41Å. Each movie consists of 70 frames. During the motion correction process, I binned the micrographs by a factor of 2. The process was executed using 2 Nvidia GPUs with CUDA version 12.2, each with 24 GB of memory. It took 15 hours for the motion correction to complete. Given that our computational infrastructure is relatively new, I’m curious to benchmark whether this 15-hour duration is typical for such a run? Additionally, I noticed that the processing load is not evenly distributed between the two GPUs, is that typical?

That timing is not entirely unreasonable, but depends on various factors you don’t mention…

The uneven GPU load is likely due to waiting for micrographs to be read: with network storage, I find reading the raw micrograph movies is longer than when compared to local storage (and whether it is on a single HDD, HDD RAID array, SSD etc…) so were the micrographs on local storage (and what type) or network storage?

You don’t mention which detector, and whether or not super resolution was used… sensor size plays a role in how many patches the algorithm defaults to, so does have an impact in speed. K3 super res data takes a bit more time to process than K2 counting mode, for example, even if file format is the same.

You also don’t mention which GPU. “24GB” covers a lot of ground, all the way from Pascal-era GPUs like the Tesla P40 through to Quadro M6000, Titan RTX, RTX 3090, RTX 4090, A5000 and A5500 or A4500Ada… all of which have very different levels of compute power, despite being 24GB cards. :wink:

Thank you for highlighting the details.
Micrographs are stored locally on an SSD drive.
The camera used is the K3, and yes, the data is collected using SuperRes.
The GPUs are GeForce RTX 4090.

That does seem a bit slow, especially considering the micrographs are on a local SSD. Assuming you have a decent CPU, there’s not much more you can do in terms of computational resources—good CPU and GPU with local SSD is already quite optimal. However, you might want to revisit your data collection practices. In most cases, saving super-resolution images doesn’t help much with processing. Also, 70 frames can be excessive; 40 frames are often sufficient. That said, since you’re using a 0.41 Å pixel size, you’re probably aiming for very high resolution. If that’s not strictly necessary, increasing the pixel size to 0.7–0.8 Å could easily get you sub-2.5 Å resolution. Each of these changes could double your processing throughput and decrease your storage requirements.

By the way @amerani, how did you figure out the gpu load was not even? It is actually a very good information that I would also like to monitor.

nvidia-smi should be available. You can run it in dmon mode and either write to stdout (if you have a quick eye) or to a file for later analysis.

nvidia-smi dmon -d=[samplingIntervalinSeconds] -s=[whatMetricsYouWant] -i=[gpuid] -f=[filenameForOutput] --format=csv

Defaults for what gets sampled are puc (power, utilisation, clocks) but utilisation spams four columns we’re not interested in.

I found a realtime plotter someone had put together years ago, but I’ve never actually used it as I’m usually happy to eyeball dmon output directly.

Or you can use something like nvtop.

1 Like