RBMC Empirical dose weights stalling

Flow · December 12, 2023, 10:30pm

Hey,

I’m trying to run RBMC on a ribosome dataset. Nothing special, about the dataset.
The Hyperparameter search works perfectly, but when it reaches the Dose weight step it just stalls for ever.
I starts on the first 5-6 micrographs (in a few minutes) and then does nothing (i let it run up to two hours) without any error message.

This is how it looks, the progression bar just doesn’t move

[2023-12-12 22:54:06.05]
[CPU: 2.27 GB]
    STARTING: COMPUTE EMPIRICAL DOSE WEIGHTS
[2023-12-12 22:54:06.06]
[CPU: 2.27 GB]
Using hyperparameters:
Spatial prior strength: 3.6279e-03
Spatial correlation distance: 500
Acceleration prior strength: 2.6015e-02

[2023-12-12 22:54:06.06]
[CPU: 2.27 GB]
Using all FCs for doseweighting

[2023-12-12 22:54:06.15]
[CPU: 2.28 GB]
Working with 320 movies containing 20046 particles

[2023-12-12 22:54:07.55]
[CPU: 3.08 GB]
Movies processed:
[▇-------------------------------------------------------------------------------] 6/320 (2%)

At first I thought it was because the particles come for 2 sets of micrographs with different dose, so i divided everything in 2 jobs and ran just the hyperparameter search step on each micrograph sets. This again worked, then i launched the RBMC jobs with the previously calculated hyperparameters, again, dose weighting starts and nothing happens.

Any idea why?

Thanks in advance

wtempel · December 12, 2023, 10:43pm

Please can you post a screenshot of the htop program that you collect when an RBMC job has stalled in this way.
You may want to

ensure that there is physical (as opposed to only swap space) available on the host at that time. What does the command
free -g show?
see if the suggestion in CryoSPARC 4.4 significantly increases "minimum" target specification for processing systems - #5 by sarulthasan applies to your case

Flow · December 12, 2023, 10:51pm

Thank you for your answer.
However i don’t think i can acces these info easily since I am subumiting these jobs to the high performance computing cluster of the university, not a local machine.

Do you think changing the GPU oversubscription memory threshold (GB) and In-memory cache size
parameters would help?

Flow · December 14, 2023, 9:17pm

Ok just to update.

The NVIDIA drivers were updated on our RTX8000 GPU node. On the this node it works, but not on the A100 GPU node (the one I was using originally - NVIDIA drivers were already up to date).

I am still not sure why it’s not working on A100s, but at least it’s working with the other one …

wtempel · December 19, 2023, 5:33pm

You may want to ask your IT support about:

RAM on the A100 node. What is the output of the command
```
free -g
```
The current setting for transparent_hugepage
```
cat /sys/kernel/mm/transparent_hugepage/enabled
```
They may want to try setting this to never.

You may also try raising the oversubscription threshold to a value that prevents oversubscription on the A100. Are the A100s of the 40GB or the 80GB variety?

RBMC Empirical dose weights stalling

[2023-12-12 22:54:06.05] [CPU: 2.27 GB]

[2023-12-12 22:54:06.05]
[CPU: 2.27 GB]