Hello,
During patch motion correction of movies I get a cuMemAlloc error. Interestingly, larger movies (i.e., same exact detector but twice as many frames) are able to correct on this system with no problem. I have tried using low memory mode and turning F-crop all the way to 1/4 to no avail. When running watch -n 1 nvidia-smi
I see memory usage go to 6.5 GB, stay there for a few seconds, and then drop once the error occurs. On the larger movies which successfully correct, usage goes to 9.8 GB but is able to complete. Please see below for system specs and the full error message.
This node is running CentOS 7, CUDA 11.2, with 4 2080 Ti cards on driver 460.39. Another node also fails in the same way, and is using 4 2080 cards.
Movies:
$ header movie_00000.tif
RO image file on unit 1 : movie_00000.tif Size= 986236 K
This is a TIFF file (in strips of 11520 x 2).
Number of columns, rows, sections ..... 11520 8184 50
Map mode .............................. 0 (byte)
Start cols, rows, sects, grid x,y,z ... 0 0 0 11520 8184 50
Pixel spacing (Angstroms).............. 0.9175 0.9175 0.9175
Cell angles ........................... 90.000 90.000 90.000
Fast, medium, slow axes ............... X Y Z
Origin on x,y,z ....................... 0.000 0.000 0.000
Minimum density ....................... 0.0000
Maximum density ....................... 64.000
Mean density .......................... 32.000
tilt angles (original,current) ........ 0.0 0.0 0.0 0.0 0.0 0.0
Space group,# extra bytes,idtype,lens . 0 0 0 0
2 Titles :
SerialEMCCD: Dose frac. image, scaled by 1.00 r/f 0
SuperRef_movie_00000.dm4
Cryosparc:
$ cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/home/cryosparc/cryosparc2/cryosparc2_master
Current cryoSPARC version: v3.2.0
----------------------------------------------------------------------------
CryoSPARC process status:
app RUNNING pid 18044, uptime 0:01:20
app_dev STOPPED Not started
command_core RUNNING pid 17888, uptime 0:01:30
command_rtp RUNNING pid 17965, uptime 0:01:26
command_vis RUNNING pid 17940, uptime 0:01:27
database RUNNING pid 17802, uptime 0:01:32
liveapp RUNNING pid 18075, uptime 0:01:18
liveapp_dev STOPPED Not started
webapp RUNNING pid 18011, uptime 0:01:22
webapp_dev STOPPED Not started
----------------------------------------------------------------------------
global config variables:
export CRYOSPARC_LICENSE_ID="***"
export CRYOSPARC_MASTER_HOSTNAME="***"
export CRYOSPARC_DB_PATH="/home/cryosparc/cryosparc2/cryosparc2_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
Error:
[CPU: 207.5 MB] Error occurred while processing J1/imported/movie_00001.tif
Traceback (most recent call last):
File "/home/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py", line 59, in exec
return self.process(item)
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 190, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 193, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 195, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 255, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 496, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 353, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
File "/home/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
Marking J1/imported/movie_00001.tif as incomplete and continuing...