Memory error only when using the SSD

Dear All, I am facing a serious problem with the second version of cryosparc, and we do not know how to solve it.
We had before cryosparc 1 installed on single GPU nodes of our cluster, and all was OK. Now, the new version runs on the batch system, that means that I submit a job and it runs on a GPU node available at that moment. Our GPU nodes have 4 GTX 1080 cards, with 11 GB of ram. The problem is that when I enable the SSD usage, it crashes soon after it starts, saying that it runs out of memory. the same jobs runs well when SSD in not enabled, but it clearly takes much longer. By monitoring the GPU usage with SSD, I see that it rarely exceeds half of the ram before crashing. Asking to parallelize the job on more than 1 GPU does not solve the issue.

I kindly ask directly the developers of cryosparc whether they could give any suggestion to solve this issue.

Thank you for reporting this issue.

Could you share the error message that you get when the job crashes?

Ali H.

thank you for your help. I paste the entire page:

Launching job on lane merlin5 target merlin5 …

License is valid.

Launching job on cluster merlin5

MemoryError: cuMemHostAlloc failed: out of memory

Hi @marino-j
Thanks again for reporting this.
It appears that it’s actually not the GPU memory that is running out, but the CPU memory. The error message is actually happening when attempting to allocate memory on the Host (CPU). Given that you are running this on a cluster, this is most likely happening because the cryosparc job process temporarily needs more RAM than is specified in the cluster submission script (16000MB), only when also caching particles from the SSD (which for some reason, requires more RAM as well…) but your cluster is set up with strict memory limits on running processes, so the process dies at this point.

In the next upcoming version we have increased the default memory request for class2D to 24000MB which should solve the problem, but temporarily you can also just hardcode the cluster submission script to a larger value for the memory request.


thank you for your help. We have increased the limit to 64000 MB and so far we are not getting the error anymore.

Actually, we still have a problem. While for some of the jobs increasing to 64GB worked, there is a case now which produces this error even setting memory limit to 120GB. Machines in our cluster have only 128GB, so we can not increase the limit more. The cryoSPARC version is 2.2.0.

Similar jobs without ssd are running, but significantly slower.