pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory in v3.0

Hi @crescalante,

Can you run the following command and paste it here:
lscpu && free -g && uname -a and if you have sudo, run the command sudo dmidecode --type memory as well.

I’m having the same issue along with a host of others, I was at cryosparc 2.6.1 and it was upgraded after a cryosparcm stop/start and reboot of the system but I get this issue.

Traceback (most recent call last):
File “cryosparc2_worker/cryosparc2_compute/run.py”, line 72, in cryosparc2_compute.run.main
File “cryosparc2_compute/jobs/jobregister.py”, line 337, in get_run_function
runmod = importlib.import_module("…"+modname, name)
File “/opt/packages/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/importlib/init.py”, line 37, in import_module
import(name)
File “cryosparc2_worker/cryosparc2_compute/jobs/rtp_workers/run.py”, line 20, in init cryosparc2_compute.jobs.rtp_workers.run
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/motioncorrection.py”, line 8, in init cryosparc2_compute.jobs.motioncorrection.motioncorrection
File “cryosparc2_compute/engine/init.py”, line 8, in
from engine import *
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 17, in init cryosparc2_compute.engine.engine
File “cryosparc2_compute/fourier.py”, line 22, in
from numba import autojit
ImportError: cannot import name autojit

So I re-installed cryosparc3 and reinstalled nvidia 440.82 drivers and cuda 10.2 toolkits. Then the main error I get when running a live session is the following

Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 356, in cryosparc_compute.jobs.rtp_workers.run.rtp_worker
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 417, in cryosparc_compute.jobs.rtp_workers.run.process_movie
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 561, in cryosparc_compute.jobs.rtp_workers.run.do_patch_motion
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 566, in cryosparc_compute.jobs.rtp_workers.run.do_patch_motion
  File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 251, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 371, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 339, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
  File "/opt/packages/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

I ran the mentioned command in this thread as well.

lscpu && free -g && uname -a
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel® Xeon® Silver 4116 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 800.008
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 16896K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
total used free shared buff/cache available
Mem: 282 36 134 0 111 244
Swap: 1 0 1
Linux 5.4.0-65-generic #73~18.04.1-Ubuntu SMP Tue Jan 19 09:02:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

dmidecode --type memory

dmidecode 3.1

Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x1000, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 3 TB
Error Information Handle: Not Provided
Number Of Devices: 24

Handle 0x1100, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: 1
Locator: A1
Bank Locator: Not Specified
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 2666 MT/s
Manufacturer: 00AD00B300AD
Serial Number: 520C8E4F
Asset Tag: 01173851
Part Number: HMA82GR7AFR8N-VK
Rank: 2
Configured Clock Speed: 2400 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V

@Paul can you please let us know which GPUs you are using? You may need to turn on “low memory mode” in cryoSPARC Live

Here is the nvidia-smi output of a typical worker node.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P5000 Off | 00000000:3B:00.0 Off | Off |
| 22% 37C P0 42W / 180W | 0MiB / 16278MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Quadro P5000 Off | 00000000:D8:00.0 Off | Off |
| 22% 35C P0 42W / 180W | 0MiB / 16278MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Here is our larger GPU server while running a live session that generates the error.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:1A:00.0 Off | 0 |
| N/A 31C P0 56W / 250W | 908MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 6000 Off | 00000000:1B:00.0 Off | 0 |
| N/A 24C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 Quadro RTX 6000 Off | 00000000:3D:00.0 Off | 0 |
| N/A 24C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 Quadro RTX 6000 Off | 00000000:3E:00.0 Off | 0 |
| N/A 25C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 4 Quadro RTX 6000 Off | 00000000:8B:00.0 Off | 0 |
| N/A 24C P8 12W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 5 Quadro RTX 6000 Off | 00000000:8C:00.0 Off | 0 |
| N/A 26C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 6 Quadro RTX 6000 Off | 00000000:B5:00.0 Off | 0 |
| N/A 25C P8 14W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 7 Quadro RTX 6000 Off | 00000000:B6:00.0 Off | 0 |
| N/A 24C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 9952 C python 225MiB |
| 0 N/A N/A 10064 C python 225MiB |
| 0 N/A N/A 10150 C python 225MiB |
| 0 N/A N/A 10238 C python 225MiB |
| 1 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 4 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 5 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 6 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 7 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |

I was not able to figure this issue out on our systems, we abandoned the database and started over with a new one. Everything works as expected now.

Thanks for the update @Paul, and glad you were able to sort this out. We’ll update the post if we are able to uncover any other ideas on the root cause.

2 Likes

We see a similar issue spontaneously after several hours of running fine w/ NU-Refinement (New), reproducible on multiple jobs with different box sizes and particle numbers on v3.1.0:

Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 84, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py", line 466, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "cryosparc_worker/cryosparc_compute/jobs/refine/newrun.py", line 467, in cryosparc_compute.jobs.refine.newrun.run_homo_refine
  File "cryosparc_worker/cryosparc_compute/jobs/ctf_refinement/run.py", line 164, in cryosparc_compute.jobs.ctf_refinement.run.full_ctf_refine
  File "cryosparc_worker/cryosparc_compute/jobs/ctf_refinement/run.py", line 434, in cryosparc_compute.jobs.ctf_refinement.run.compute_phase_errors
  File "cryosparc_worker/cryosparc_compute/engine/newengine.py", line 448, in cryosparc_compute.engine.newengine.EngineThread.preprocess_image_data
  File "cryosparc_worker/cryosparc_compute/engine/newengine.py", line 442, in cryosparc_compute.engine.newengine.EngineThread.preprocess_image_data
  File "cryosparc_worker/cryosparc_compute/engine/newgfourier.py", line 22, in cryosparc_compute.engine.newgfourier.get_plan_R2C_2D
  File "/home/hiter/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/skcuda/fft.py", line 127, in __init__
    onembed, ostride, odist, self.fft_type, self.batch)
  File "/home/hiter/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/skcuda/cufft.py", line 742, in cufftMakePlanMany
    cufftCheckStatus(status)
  File "/home/hiter/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/skcuda/cufft.py", line 117, in cufftCheckStatus
    raise e
skcuda.cufft.cufftAllocFailed

Hi Navid,

I have the same error with you when running 2D classification job, I’m wondering how did you figure it out at last?

1 Like

We have not been able to solve this issue yet.

Hi @Navid, @CleoShen,

What OS are you running cryoSPARC on?

Hey @CleoShen,

Is there any way you can re-install cryoSPARC on an OS like Ubuntu? So far, that’s the most reliable way to get rid of this error. If that’s not possible, then we have a (potential) fix coming out in the next release, which should be very soon.

1 Like

Thank you for your advice. A naive question: How to manually update the new cryoSPARC once you release it?

Hi @CleoShen,

You can update to any version of cryoSPARC by running the command cryosparcm update --version=<cryoSPARC version>
More information here: https://guide.cryosparc.com/setup-configuration-and-management/software-updates

Hi Stephan and others in this group,
I tested v.3.3.1 with a small T20S data set in a workstation (CentOS 7.5) containing 4 RTX3090 GPUs. The test went well. But when I imported a larger set of particles from a Relion job (~370,000 with a box of 432 x 432), 2D classification failed after 4 runs due to “pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory”. Each GPU has 24 GB memory with Driver Version: 460.27.04 and CUDA Version: 11.2. GPU1-3 were used and memory usage was relatively low most of the time (less than 10 GB).
I wonder if you ever encountered the same error. GPU0 was not used due to X-server.
Thanks for your attention.
Qiu-Xing

I’m also on v3.3.1 +220315 on CentOS7 and now continuously getting “pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory” on small jobs (class2D, ab initio, and homo ref with binned particles). These jobs may complete normally when cloned. nvidia-smi indicates less than 1gb of memory used at the time of failure w/ fan 47% and temp 52C and power 123/350 W for a 3080Ti.

I’ve been running stably on 3.3.1 +220315 for months and the failure rate has increased a lot recently despite no update to software or drivers.

@user123 @jiangq9992003
cuMemHostAlloc relates to host (rather than GPU) memory.
Do your cryosparc_worker/config.sh files define
export CRYOSPARC_NO_PAGELOCK=true?
Please see CUDA memory error during 2D classification - #9 by spunjani for a related discussion.

2 Likes

Hi,
I would like to know if someone found a solution for this ‘out of memory’ issue.
I have encountered same error during ab initio reconstruction with v4.2.1 (CUDA ver 11.3, CentOS-7)
I have ~3mil particles with box size of 300.
Thanks!

Please post the text of the error message(s) and traceback(s) from the Event Log
and job log (Metadata|Log).
In case you observed precisely cuMemHostAlloc,

@wtempel
Thank you so much for your answer.
Adding export CRYOSPARC_NO_PAGELOCK=true in cryosparc_worker/config.sh worked for me - and it is running without errors.