pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory in v3.0

I updated to version 3.0 and run 2D classification on ~ 3M particles. The following error appeared:
[CPU: 7.43 GB] Traceback (most recent call last):
File “/home/xxx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1711, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 129, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 130, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 1066, in cryosparc_compute.engine.engine.process.work
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 499, in cryosparc_compute.engine.engine.EngineThread.cull_candidates
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 312, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

The box size was 300 and was running on a system with 4 GTX 1080 Ti GPUs. I run again with ~ 1M particles and the box size 256. Same error.I had previously run this data on cryosparc v2.5 using ~ 1M particles and was working OK. Any ideas why the error? Thanks.

I ran into the same problem, previous dataset cause cuda error in 2D classification. I just upgraded to 3.0.1 today, it seems working so far. I am wondering it may due to some cuda compatibility issue.

I will update to 3.0.1. I have Cuda 10.2 and it was ok in v12.5. Thanks

That solved the problem. Updating to 3.01. Thanks

Actually, there are still some issues. I had a particle set with around 7M particles and split it into four. Three of them worked fine but one still had the “cuMemHostAlloc failed: out of memory”. Same number of particles.

I am seeing similar issue in heterorefinement in 3.0.1

aceback (most recent call last):
File “/home/cryosparc_user/software/cryosparc2_worker/cryosparc_compute/jobs/runcommon.py”, line 1722, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 129, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 130, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 1066, in cryosparc_compute.engine.engine.process.work
File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 499, in cryosparc_compute.engine.engine.EngineThread.cull_candidates
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 312, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory

This also occurs when running prior data that completed without error in V2.15

This appears to be related to cryoSPARC mis-allocating memory. In our case if you stop cryoSPARC and reboot the system then restart cryoSPARC everything is fine. Not sure how to troubleshoot this problem or why it happens.

1 Like

@MHB, can you please let us know which OS you are running, as well as which GPUs and your NVIDIA driver version?

The same happened to me running a heterogeneous refinement job on v3.1. I stopped cryosparc and rebooted computer. After that, job run fine. The rebooting seems to be required as a simple stopping and starting cryosparc did not work.

Hi @crescalante,

Can you run the following command and paste it here:
lscpu && free -g && uname -a and if you have sudo, run the command sudo dmidecode --type memory as well.

I’m having the same issue along with a host of others, I was at cryosparc 2.6.1 and it was upgraded after a cryosparcm stop/start and reboot of the system but I get this issue.

Traceback (most recent call last):
File “cryosparc2_worker/cryosparc2_compute/run.py”, line 72, in cryosparc2_compute.run.main
File “cryosparc2_compute/jobs/jobregister.py”, line 337, in get_run_function
runmod = importlib.import_module("…"+modname, name)
File “/opt/packages/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/importlib/init.py”, line 37, in import_module
import(name)
File “cryosparc2_worker/cryosparc2_compute/jobs/rtp_workers/run.py”, line 20, in init cryosparc2_compute.jobs.rtp_workers.run
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/motioncorrection.py”, line 8, in init cryosparc2_compute.jobs.motioncorrection.motioncorrection
File “cryosparc2_compute/engine/init.py”, line 8, in
from engine import *
File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 17, in init cryosparc2_compute.engine.engine
File “cryosparc2_compute/fourier.py”, line 22, in
from numba import autojit
ImportError: cannot import name autojit

So I re-installed cryosparc3 and reinstalled nvidia 440.82 drivers and cuda 10.2 toolkits. Then the main error I get when running a live session is the following

Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 356, in cryosparc_compute.jobs.rtp_workers.run.rtp_worker
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 417, in cryosparc_compute.jobs.rtp_workers.run.process_movie
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 561, in cryosparc_compute.jobs.rtp_workers.run.do_patch_motion
  File "cryosparc_worker/cryosparc_compute/jobs/rtp_workers/run.py", line 566, in cryosparc_compute.jobs.rtp_workers.run.do_patch_motion
  File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 251, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 371, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 339, in cryosparc_compute.engine.cuda_core.EngineBaseThread.ensure_allocated
  File "/opt/packages/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/site-packages/pycuda/gpuarray.py", line 210, in __init__
    self.gpudata = self.allocator(self.size * self.dtype.itemsize)
pycuda._driver.MemoryError: cuMemAlloc failed: out of memory

I ran the mentioned command in this thread as well.

lscpu && free -g && uname -a
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel® Xeon® Silver 4116 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 800.008
BogoMIPS: 4200.00
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 16896K
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d
total used free shared buff/cache available
Mem: 282 36 134 0 111 244
Swap: 1 0 1
Linux 5.4.0-65-generic #73~18.04.1-Ubuntu SMP Tue Jan 19 09:02:24 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

dmidecode --type memory

dmidecode 3.1

Getting SMBIOS data from sysfs.
SMBIOS 3.0.0 present.

Handle 0x1000, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 3 TB
Error Information Handle: Not Provided
Number Of Devices: 24

Handle 0x1100, DMI type 17, 40 bytes
Memory Device
Array Handle: 0x1000
Error Information Handle: Not Provided
Total Width: 72 bits
Data Width: 64 bits
Size: 16384 MB
Form Factor: DIMM
Set: 1
Locator: A1
Bank Locator: Not Specified
Type: DDR4
Type Detail: Synchronous Registered (Buffered)
Speed: 2666 MT/s
Manufacturer: 00AD00B300AD
Serial Number: 520C8E4F
Asset Tag: 01173851
Part Number: HMA82GR7AFR8N-VK
Rank: 2
Configured Clock Speed: 2400 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V

@Paul can you please let us know which GPUs you are using? You may need to turn on “low memory mode” in cryoSPARC Live

Here is the nvidia-smi output of a typical worker node.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P5000 Off | 00000000:3B:00.0 Off | Off |
| 22% 37C P0 42W / 180W | 0MiB / 16278MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Quadro P5000 Off | 00000000:D8:00.0 Off | Off |
| 22% 35C P0 42W / 180W | 0MiB / 16278MiB | 0% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Here is our larger GPU server while running a live session that generates the error.

±----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:1A:00.0 Off | 0 |
| N/A 31C P0 56W / 250W | 908MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 Quadro RTX 6000 Off | 00000000:1B:00.0 Off | 0 |
| N/A 24C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 2 Quadro RTX 6000 Off | 00000000:3D:00.0 Off | 0 |
| N/A 24C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 3 Quadro RTX 6000 Off | 00000000:3E:00.0 Off | 0 |
| N/A 25C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 4 Quadro RTX 6000 Off | 00000000:8B:00.0 Off | 0 |
| N/A 24C P8 12W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 5 Quadro RTX 6000 Off | 00000000:8C:00.0 Off | 0 |
| N/A 26C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 6 Quadro RTX 6000 Off | 00000000:B5:00.0 Off | 0 |
| N/A 25C P8 14W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 7 Quadro RTX 6000 Off | 00000000:B6:00.0 Off | 0 |
| N/A 24C P8 13W / 250W | 8MiB / 22698MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 9952 C python 225MiB |
| 0 N/A N/A 10064 C python 225MiB |
| 0 N/A N/A 10150 C python 225MiB |
| 0 N/A N/A 10238 C python 225MiB |
| 1 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 4 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 5 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 6 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |
| 7 N/A N/A 1755 G /usr/lib/xorg/Xorg 4MiB |

I was not able to figure this issue out on our systems, we abandoned the database and started over with a new one. Everything works as expected now.

Thanks for the update @Paul, and glad you were able to sort this out. We’ll update the post if we are able to uncover any other ideas on the root cause.