Hi,
I’m running a homogenous refinement using cryoSPARC and I am encountering issues with memory overload. However, I’m wondering if something is not configured correctly. Unfortunately, I am not the admin on this cluster. Below is the last segment of the event log if that helps. I’ve given this process 96 GB of ram a single GPU and 16 CPU cores (though I am not sure the details of the processor units at this time).
:1: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
========= sending heartbeat at 2025-01-24 14:18:27.295828
:1: RuntimeWarning: invalid value encountered in true_divide
========= sending heartbeat at 2025-01-24 14:18:37.340825
========= sending heartbeat at 2025-01-24 14:18:47.357681
========= sending heartbeat at 2025-01-24 14:18:57.375894
/blue/rmckenna/apps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:656: FutureWarning: rcond
parameter will change to the default of machine precision times max(M, N)
where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None
, to keep using the old, explicitly pass rcond=-1
.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
========= sending heartbeat at 2025-01-24 14:19:07.393951
========= sending heartbeat at 2025-01-24 14:19:17.405793
/blue/rmckenna/apps/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:571: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
========= sending heartbeat at 2025-01-24 14:19:27.420333
/blue/rmckenna/apps/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:44: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
========= sending heartbeat at 2025-01-24 14:19:37.435207
========= sending heartbeat at 2025-01-24 14:19:47.452905
========= sending heartbeat at 2025-01-24 14:19:57.468302
========= sending heartbeat at 2025-01-24 14:20:07.487353
========= sending heartbeat at 2025-01-24 14:20:17.504962
========= sending heartbeat at 2025-01-24 14:20:27.645527
========= sending heartbeat at 2025-01-24 14:20:37.663808
========= sending heartbeat at 2025-01-24 14:20:47.677319
/blue/rmckenna/apps/cryosparc/cryosparc_worker/bin/cryosparcw: line 153: 404114 Killed python -c “import cryosparc_compute.run as run; run.run()” “$@”
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=56603382.batch. Some of your processes may have been killed by the cgroup out-of-memory