Bizarre job scheduling issues with Reference Based Motion Correction

cbeck · April 30, 2024, 3:46am

Hi Yang,

The workstation has 16 CPU cores and 128 GB of RAM. Thank you for the comment about each RAM slot representing 8GB, I wasn’t aware of that before. With 3 GPUs, RBMC is using 16 RAM slots, which is all of the RAM. I didn’t know that RBMC required that much - other jobs only use a fraction of that amount.

After reading your comment, I went back to watch the RBMC tutorial video. Towards the end, the developer mentions that by default, the job sets aside 80 GB of RAM, which is also a large amount. Additionally, I found a post from 2017 describing a similar issue where refinement jobs would queue but not run:

CryoSPARC’s scheduler checks both GPU and RAM availability when launching jobs, and refinement jobs require the system to have at least 24GB RAM before they get launched.

Together, all this information appears to explain why the refinement jobs specifically would queue but not run: RBMC was already reserving nearly all my RAM, and there wasn’t enough for the refinement jobs to launch. Thank you for your help, Yang!