3dva: cuda_error_out_of_memory

BWise · November 20, 2023, 11:57am

Hi, I have been getting an out of memory error when running 3DVA:
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

the job runs through the 2 Initial reconstruction steps without issue. The error starts at the start of iteration 0.

I updated all drivers and to the latest cryosparc hoping it would solve the issue but it did not help. I have 2 RTX2070 GPUs. I don’t get this error on a newer machine that has 2 RTX3090 GPUs installed.

Is there anything I can do about this or the RTX2070 just physically doesn’t have enough memory to run 3DVA? All other jobs run without issues (patch motion and ctf, NU refinements etc.)

thanks!
Ben

rbs_sci · November 20, 2023, 12:26pm

8GB of RAM really struggles with complex stuff like 3DVA and 3Dflex. If the box size is small enough, it might be OK, but I’m very close to decommissioning a long-serving quad-RTX2080 box because 8GB just isn’t enough for more complex tasks now. The 2070 will suffer a similar problem.

BWise · November 21, 2023, 11:12am

yes thats what I thought I guess we will have to think about that also.
Thanks for your input!

BWise · November 22, 2023, 11:10am

Hi again,
I spoke too soon. Actually this sample crashed also when i switched to our newer machine with the RTX3090s.

It made it to iteration 3 at least before crashing… the only error message is: Job is unresponsive - no heartbeat received in 180 seconds. And actually it completely crashes cryosparc. I need to reboot the machine to start cryosparc again.

I thought the newer machine would be fine since I ran 3dva on a different dataset of the same protein complex without any issue. The only difference between the 2 datasets is the current dataset has a substrate bound (and I am using cryosparc v4.4 instead of v4.3). Otherwise the 2 datasets are very similar (similar data collection parameters, extraction box size, pixel size etc)

does anyone know what the problem could be? or how I could trouble shoot this?
thanks!
Ben

olibclarke · November 22, 2023, 12:02pm

What is your box size? Even on 3090s, 3D-VA will often run out of memory with box sizes >300pix (400pix sometimes ok but is a stretch). Have you tried downsampling your particles?

BWise · November 22, 2023, 12:17pm

box is 500… I thought to do downsample but a 500 box size ran fine on a previous dataset with ~175000 particles. this data has ~80000 particles… I didn’t think the 500 box size would be a problem then…

olibclarke · November 22, 2023, 12:19pm

500px box size 3D-VA will regularly crash on a 3090 - I would suggest downsampling to 300px

BWise · November 22, 2023, 12:23pm

ok thanks. I will try that
but why didn’t it crash with the previous dataset with a 500 box size then? I ran many 3dva runs fiddling with the parameters and the runs never crashed with the 500box size before…

BWise · November 22, 2023, 12:27pm

also. should I rerun the consensus refinement with the downsampled particles before running the 3dva? or is it safe to just feed the downsampled particles directly into the 3dva job?

olibclarke · November 22, 2023, 12:29pm

You can just downsample and rerun directly, no need to rerun the consensus refinement.

Not sure why it didn’t crash before - maybe the number of components? - but in our hands it is not stable above 300px box size

BWise · November 22, 2023, 12:40pm

ok thanks alot I will do that then.
no same number of components… strange. i guess it was my lucky day back then