Is it required to disable transparent hugepages? If not, is there a way of configuring CryoSPARC to stop displaying warnings incessantly about it?
As of CryoSPARC version 4.6.2, if transparent hugepage/enabled
is currently set to [always]
, changing the setting to [madvise]
in is highly recommended. The warning about an active [always]
setting can not be disabled.
[madvise]
needs to be on worker nodes, not just the server host? I thought this was for mongodb
performance only? Is cryosparc utilizing MADV_NOHUGEPAGE
?
@DavidHoover yes it is needed on the worker nodes. By our observations, the main piece of software that this affects is actually Numpy, not MongoDB.
What would happen if we changed export NUMPY_MADVISE_HUGEPAGE=0
to export NUMPY_MADVISE_HUGEPAGE=1
in cryosparcw
? Would this disable any warnings about THP being always
? Is there really that much of a performance degradation when THP is enabled?
The visibility of the THP warning in CryoSPARC is unaffected by the NUMPY_MADVISE_HUGEPAGE
setting. The warning is expected to be shown as long as transparent_hugepage.enabled
is set to always.
Yes, with affected sytems and/or workloads, the THP-related performance degradation is significant.
Can you point out what job types or situations where the performance is degraded? Is there a way we could test it locally on our nodes?
Non-uniform refinement gives an immediately obvious measure. THP enabled/madvise = 120,000 seconds for a run. THP disabled = 8,000 seconds for the same run. That was the most extreme example I saw personally.
Thanks for this datapoint. Do you recall the CryoSPARC version for this comparison. In CryoSPARC v4.6.2 and v4.7.0, a significant delay with the [madvise]
kernel setting would be unexpected.
Pretty sure it was before you changed the defaults, so 4.5.3 or earlier. Definitely not 4.6.2 or 4.7.0
Is this related to BUG: numpy.zeros misses out on hugepages · Issue #27483 · numpy/numpy · GitHub? If so, then the issue might be that np.zeros
doesn’t actually use hugepages, while other parts of numpy do. Are there any plans to bump the worker’s version of numpy from 1.22.4 to 2.2.0 in a future CryoSPARC release so that we could see if that makes a difference?
Also of note- the numpy developers don’t seem to think that madvise actually does anything.