3DFlex train job - RAM unavailable


I am trying to run the new 3D flex job for my project. The first two jobs of the sequence have run successfully (data prep and mesh prep). The train job however is not running due to “RAM not available”. My linux box has 20 CPU, 64 RAM and 2 GPU.

I have tried to decrease the compute power needed by cutting the number of particles in the data prep job (470000 to 100000) as well as cutting the box size down from crop: 512 train: 256 to crop: 336 train: 128. In all cases the job is not running.

The job says that it needs 4 CPU cores, 64 GB RAM, 1 GPU, so I am not understanding why the error is occurring. Any help would be appreciated.

Thanks for your help in advance.

Hi @AnokhiShah,

Thanks for reporting. 3DFlex requires a lot of RAM because 1) it stores all particles in RAM during training and reconstruction and 2) the model itself requires a lot of RAM to store and train. We don’t currently have precise bounds/estimates on how much RAM the method will use, so the 64GB requirement is more of a placeholder. For some datasets (depending on many factors like box size, number of voxels in the mask, number of particles, pixel size, mesh params, number of mesh cells, etc) the actual usage may be a lot less or a lot more than this number.
Likewise we do not have tight bounds on GPU memory usage, though we have run experiments on cards with as little as 16GB of GPU RAM.
Eventually we hope to better characterize memory usage, but for now you may have to find another node to run on or a different solution.


1 Like