V3.0.1 cuMemHostAlloc problems and Abinitio stops without error

After installation of v3.0 and later v3.01, I have encounter several problems. Fist, the cuMemAlloc issue during 2D classification. I had four particle sets that I had split using the particle tools. One of them gives me the error. Later, when I try to run Abinitio reconstruction, it stops abruptly early in the process.

What gpus are you using? Do you have anything that monitors the worker host collecting the system stats?

I’ve been having this issue since v2.15 with v100 32gig cards, the gpu memory never goes about 10gigs yet jobs will randomly fail with cuMemAlloc error. I’ve tried various nvidia drivers and cuda versions, with no noticeable differences.

No, i don’t have anything monitoring that. I have 4 x GTX 1080Ti (11 GB) with cuda 10.2. Everything was running fine in v2.15 in my case.

Hi there,

Could you confirm whether you’re seeing an error related to cuMemAlloc or cuMemHostAlloc? If it’s the latter, the system RAM is being exhausted, not the GPU RAM. How much system RAM do you have, and does running fewer concurrent jobs eliminate the issue?

(Edit: this is a slight oversimplification on my part - it’s possible to see this error while not being completely out of RAM, but it’s nevertheless related to low system memory and/or low locked memory limit).

Thanks,

Harris

I have 256 G of RAM. The error during 2D classification was “cuMemHostAlloc”. Even if I run only one job, the error is there.
For the Abinitio reconstruction error, it is the same “pycuda._driver.MemoryError: cuMemHostAlloc failed: out of memory”

In previous version, i did not encounter this issue. Thanks for your response.

Interesting. I will look into this and get back to you. Nothing changed within cryoSPARC itself in terms of the memory requirements for 2D classification, but we did change which version of several dependency libraries we’re using.

In the meantime could you please post the output of the following command?
lscpu && free -g && uname -a && nvidia-smi

This is the output:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
Stepping:              1
CPU MHz:               1200.347
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4200.02
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              40960K
NUMA node0 CPU(s):     0-15,32-47
NUMA node1 CPU(s):     16-31,48-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
              total        used        free      shared  buff/cache   available
Mem:            251           9           7           0         234         240
Swap:             3           0           3
Linux odin 3.10.0-1160.6.1.el7.x86_64 #1 SMP Tue Nov 17 13:59:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Thu Dec 17 15:37:44 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
| 28%   28C    P8     9W / 250W |    443MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 108...  Off  | 00000000:03:00.0 Off |                  N/A |
| 28%   27C    P8     8W / 250W |      2MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 108...  Off  | 00000000:82:00.0 Off |                  N/A |
| 28%   28C    P8     8W / 250W |      2MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 108...  Off  | 00000000:83:00.0 Off |                  N/A |
| 28%   30C    P8     9W / 250W |      2MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      3057      G   /usr/bin/X                        269MiB |
|    0   N/A  N/A      4435      G   /usr/bin/gnome-shell              170MiB |
+-----------------------------------------------------------------------------+

I just noticed that the cuda version in the GPUs is cuda 11.0 but I thought it was cuda 10.2. It might be related to this. I had installed cuda 11 but later I installed 10.2 as Motioncor2 works with this version but not 11.

you can see what cuda version is being used by checking nvcc (nvcc --version)

Hi @crescalante,

Is it possible if you can recompile cryoSPARC with the correct version of CUDA by doing the following:

  1. Make sure the user’s .bashrc doesn’t have any unnecessary CUDA_PATH related variables in it. Or, if it’s required, make sure they’re pointing to the correct one: /usr/local/cuda-10.2
  2. Ensure the CUDA path inside cryosparc2_worker/config.sh also points to /usr/local/cuda-10.2
  3. Log out and log back in, to clear your environment
  4. Navigate to cryosparc2_worker
  5. Run the command: ./bin/cryosparcw newcuda /usr/local/cuda-10.2

Ok, I’ll try that but I remove previous cuda versions and reinstalled 10.2 again.

If i run nvcc --version. It says that I have cuda 10.2 but if I run nvidia-smi it says cuda version is 11.0. Any suggestions why there is this discrepancy?

@crescalante, I think this is why:

Yes, I saw that but if I understood correctly, it should not be a problem. Nevertheless, I will install cuda 11 and see if that solves the issue. Thanks

Hey @crescalante, could you please post the output of ulimit -l as well, when you have the chance?

Thanks!

ulimit -l :
output is 64

I see! This might be the issue - the amount of memory you’re allowed to page lock is 64kB. Are you on CentOS by chance, and do you have root access to the machine? The procedure for changing the limit varies a bit between operating systems.

Yes, I have Centos 7 and have root access

Excellent. Try this:

As root, edit the file /etc/systemd/system.conf
Find the line that reads
#DefaultLimitMEMLOCK=
and change it to
DefaultLimitMEMLOCK=60000000

You will most likely need to reboot for it to take effect. Once you’ve done so, run ulimit -l again, and make sure that it has changed. If that works, try running the failing jobs again.

Harris

1 Like

Ok, I will try that.

Harris,
At the end after I reinstalled cuda 10.2 , everything works. But I also going to increase the DefaultLimitMEMLOCK as you suggested. Thanks for the help.

2 Likes