I’ve seen this topic come up a few times in the forum here but still I can’t resolve my problem. I am running my RBMC on a single node which has 10 GPU cards with specs below. I am the only person using the node. Job details are below:
of movies: 7200
frames per movie 54, dimension 4092 x 5760
collected on K3 in .tiff format.
particles input 2.5M particles (yes, even after extensive classification).
box size 320px (particle diameter here corresponds to ~70px)
With default settings I get the error below. So I tried increasing oversubscription threshold to to 500GB and cache to 20GB. The job with 1 GPU seems to go extremely slow, and I calculated it would take several months for the job to complete.
Does it make sense here to split up the particles into many batches and process separately then recombine later? I was thinking that more particles gives a better estimates, as is the case in RELION. I suppose I could split the movies into batches instead.
Any help is appreciated
thanks
Jesse
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce GTX1080 Ti On | 00000000:04:00.0 Off | N/A |
| 36% 47C P2 57W / 250W | 8964MiB / 11264MiB | 0% E. Process |
| | | N/A |
+-------------------------------+----------------------+----------------------+
error I am receiving:
self.handle = gpufft.gpufft_get_plan(
RuntimeError: cuda failure (driver API): cuMemAlloc(&plan_cache.plans[idx].workspace, plan_cache.plans[idx].worksz)
-> CUDA_ERROR_OUT_OF_MEMORY out of memory
What GPUs are being used? If they’re 24GB cards, set the oversubscribe to 30GB, if 48GB set to 50GB. If you have enough system RAM, you can safely set the RBMC run to use all GPUs, although scaling in non-linear (best I’ve seen in testing is scaling from a job taking 15 days on one GPU to taking 2 days across ten GPUs).
RBMC out of (CUDA) memory errors are almost always (at least in tests I’ve done) due to either box size (as it’s 320 pix this is not the case) or too many particles per micrograph. You can split the stack into, say, four (remove duplicates is the quickest way I’ve found) and run RBMC on the individual stack groups.
even with 1 GPU and 500GB oversub limit and 20gb cache it still crashed. I tried splitting into subsets of micrographs with only 1000 mics per subset, even that crashed. I would have to split it into maybe 30 subsets for this to run.
Also, yesterday it ran through about 40% of the micrographs when I tried on 4 GPUs, but today it crashed after only 6% of the micrographs when I ran it on 1 GPU. I don’t understand why it was able to process through 40% then suddenly it crashes.
This is a bit frustrating, I guess I will give up and head to RELION for polishing then re-import into cryosparc.
It’s not mics per subset that is important, it’s particles per micrograph. Hit a micrograph with more particles than can fit, and it’ll crash. All it takes is one or two micrographs with more than that critical threshold.
The nvidia-smi output you provided lists only a single 1080Ti. Are the other cards the same? 8GB cards were deprecated recently (although they’ll still work, some functions are quite limited) and 11GB cards are really on the edge of having enough memory with the new pathways in CryoSPARC 4.4 - things which used to be OK now crash with out of memory errors, for example.
If you’re going the RELION route, good luck! I think Bayesian polishing still has an edge (just) compared to RBMC, but it is so painfully slow for us (although I think that is an EER thing) and also wants a ton of memory - just system RAM, not VRAM.
Sorry that you’re having memory issues with RBMC. I’m not sure that it will actually help you overcome the current problem, but if you’re interested, this post goes over the factors that affect the memory usage of RBMC: Memory issues with Reference Based Motion Correction? - #3 by hsnyder
I’ll also note that we’re planning to issue an update to RBMC in a future release of CryoSPARC that will help to reduce memory (RAM and VRAM demands).
to update, I pulled the microgaphs and particles into a “manually curate exposures” job. I removed any micrographs that had way more particles than the rest. Obviously this is suboptimal since those are probably good particles, but it did result in the RBMC job running!
In the end it did not improve my map though :(. possibly because those particles I had to remove. I am trying now also to reduce the box size slightly and repeat without removing the particles.
You can use the remove duplicate particles job to subset particles for which there are too many per micrograph. You might have to play with the distance between particles to get the appropriately sized subsets such that RBMC can run on your GPU. If you use this method, you would just run separate RBMC jobs and supply the dose weights and hyperparameters such that all of the polishing is done the same for all jobs.