NU Refinement memory issues

Thanks very much for your reply. The job should have full and exclusive access to the node’s RAM in this case as far as I know. The job is being allocated 240 Gb system RAM by slurm and under these conditions the node will not accept any other jobs. I am not aware of the job’s access to VRAM being constrained in any way - if slurm or cryoSPARC configuration are doing so I am not aware of it.

The slurm.err file contains this line only:

slurmstepd-linux1006: error: Detected 1 oom-kill event(s) in step 7500202.batch cgroup. Some of your processes may have been killed by the
cgroup out-of-memory handler.

The largest amount of RAM I can see reported as used in the event log is 194.26 Gb. Large, but quite a long way short of the system RAM available.

These are the lines in job .log following the last succesful cycle of refinement (with many hearbeat lines removed)

:1: RuntimeWarning: invalid value encountered in true_divide
:1: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
Deprecated in NumPy 1.20; for more details and guidance: NumPy 1.20.0 Release Notes — NumPy v2.5.dev0 Manual
:1: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
Deprecated in NumPy 1.20; for more details and guidance: NumPy 1.20.0 Release Notes — NumPy v2.5.dev0 Manual
/mnt/beegfs/software/structural_biology/release/cryosparc/seiradake/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:660: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
x = n.linalg.lstsq(w.reshape((-1,1))A, wb)[0]
/mnt/beegfs/software/structural_biology/release/cryosparc/seiradake/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:571: RuntimeWarning: divide by zero encountered in log
logabs = n.log(n.abs(fM))
/mnt/beegfs/software/structural_biology/release/cryosparc/seiradake/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:44: RuntimeWarning: invalid value encountered in sqrt
cradwn = n.sqrt(cradwn)
:1: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
:1: RuntimeWarning: invalid value encountered in true_divide