During 2D classification in cryoSPARC Live, I often get an error regarding memory issues. I have run in Low memory mode, reduced Output F-crop factor to 0.5, and still see the error. System details: v4.2.0, master-worker configuration, Ubuntu 18.04.4 LTS, CUDA Version: 10.1, 8 GPUs x GeForce RTX2080Ti
Similar error has also appeared in non-live cryoSPARC jobs.
Error message:
[CPU: 15.11 GB]
Traceback (most recent call last): File "/opt/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook run_old(*args, **kw) File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1028, in cryosparc_compute.engine.engine.process.work File "cryosparc_master/cryosparc_compute/engine/engine.py", line 107, in cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu File "cryosparc_master/cryosparc_compute/engine/gfourier.py", line 32, in cryosparc_compute.engine.gfourier.fft2_on_gpu_inplace File "/opt/cryosparc2_worker/cryosparc_compute/skcuda_internal/fft.py", line 115, in __init__ self.handle = gpufft.gpufft_get_plan( RuntimeError: cuda failure (driver API): cuMemAlloc(&plan_cache.plans[idx].workspace, plan_cache.plans[idx].worksz) -> CUDA_ERROR_OUT_OF_MEMORY out of memory
[CPU: 15.11 GB]
Traceback (most recent call last):
File "/opt/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook run_old(*args, **kw)
File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1028, in cryosparc_compute.engine.engine.process.work
File "cryosparc_master/cryosparc_compute/engine/engine.py", line 107, in cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu
File "cryosparc_master/cryosparc_compute/engine/gfourier.py", line 32, in cryosparc_compute.engine.gfourier.fft2_on_gpu_inplace
File "/opt/cryosparc2_worker/cryosparc_compute/skcuda_internal/fft.py", line 115, in __init__ self.handle = gpufft.gpufft_get_plan(
RuntimeError: cuda failure (driver API): cuMemAlloc(&plan_cache.plans[idx].workspace, plan_cache.plans[idx].worksz) -> CUDA_ERROR_OUT_OF_MEMORY out of memory
Is this memory used by a process related to the aforementioned CryoSPARC Live session? Could a collision between the CryoSPARC job and another compute load have caused the cuMemAlloc failure?
My group appears to be encountering a similar issue where a CryoSPARC process is trying to allocate more memory on a GPU than is available. So far we’ve been able to determine the following:
The cause isn’t due to F-crop settings; F-crop is not being used.
Low memory mode does not seem to help.
No other compute loads are contending for GPU RAM.
I tried looking for some logs to aid in troubleshooting, but it seems that these are deleted after a job ends? If I’m mistaken and jobs from CryoSPARC Live sessions are retained, where would they typically end up at?
This output reflects the fact that there were two jobs running on the node and that both of them failed with the same out of memory issue. The nvidia-smi output indicates that the memory usage was near capacity (as might be expected with a memory allocation error).
The error seems to occur during motion correction. Here’s a photo of the stack trace that was provided by our user:
I do not know what the extraction box size was and the user has deleted the session. I can ask them to provide this window if/when they attempt to run CryoSPARC Live next.