pycuda.driver.LogicError

Hello everyones jobs have stopped running with this error below.
All


Applying the new patch did not fix the issue.

We have the cryosparc version and specs below.
cryosparc v3.3.1+220315
4 RTX 2080 GPU, RAM 11 GB, CUDA 10.2

Any suggestions?

Thank you in advance,
DG

@Dominique Please can you post the error messages as text so that forum users with a similar problem may find this topic more easily.

this is the text for Non-uniform refinement job

2:49

[CPU: 4.18 GB] Traceback (most recent call last): File “cryosparc_worker/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main File “cryosparc_master/cryosparc_compute/jobs/refine/newrun.py”, line 348, in cryosparc_compute.jobs.refine.newrun.run_homo_refine File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 908, in cryosparc_compute.engine.engine.process File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 34, in cryosparc_compute.engine.cuda_core.initialize pycuda._driver.LogicError: cuDevicePrimaryCtxRetain failed: invalid argument

2:50

and this is for 2d class

2:50

[CPU: 873.5 MB] Traceback (most recent call last): File “cryosparc_worker/cryosparc_compute/run.py”, line 85, in cryosparc_compute.run.main File “cryosparc_worker/cryosparc_compute/jobs/class2D/run.py”, line 323, in cryosparc_compute.jobs.class2D.run.run_class_2D File “cryosparc_worker/cryosparc_compute/engine/engine.py”, line 908, in cryosparc_compute.engine.engine.process File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 34, in cryosparc_compute.engine.cuda_core.initialize pycuda._driver.LogicError: cuDevicePrimaryCtxRetain failed: invalid argument

Are aware of any recent software updates (system software, nvidia driver, CUDA toolkit)?
Please can you post the following information:

[cryosparc_user@strbiogpu02 ~]$ env | grep PATH
LD_LIBRARY_**PATH** =/home/cryosparc_user/cryosparc_software/cryosparc2_worker/cryosparc_compute/blobio:/home/cryosparc_user/cryosparc_software/cryosparc2_worker/cryosparc_compute/libs:/home/cryosparc_user/cryosparc_software/cryosparc2_worker/deps/external/cudnn/lib:/usr/local/cuda-10.2/lib64:/usr/local/cuda-10.2/lib64:
**PATH** =/home/cryosparc_user/cryosparc_software/cryosparc2_worker/bin:/home/cryosparc_user/cryosparc_software/cryosparc2_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/cryosparc_user/cryosparc_software/cryosparc2_worker/deps/anaconda/condabin:/usr/local/cuda-10.2/bin:/home/cryosparc_user/cryosparc_software/cryosparc2_master/bin:/usr/local/cuda-10.2/bin:/usr/local/IMOD/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/IMOD/pythonLink:/home/cryosparc_user/.local/bin:/home/cryosparc_user/bin
MODULE**PATH** =/usr/share/Modules/modulefiles:/etc/modulefiles
CRYOSPARC_**PATH** =/home/cryosparc_user/cryosparc_software/cryosparc2_worker/bin
PYTHON**PATH** =/home/cryosparc_user/cryosparc_software/cryosparc2_worker
CRYOSPARC_CUDA_**PATH** =/usr/local/cuda-10.2

[cryosparc_user@strbiogpu02 ~] {CRYOSPARC_CUDA_PATH}/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

[cryosparc_user@strbiogpu02 ~]$ python -c “import pycuda.driver; print(pycuda.driver.get_version())”
(10, 2, 0)

[cryosparc_user@strbiogpu02 ~]$ uname -a && free -g && nvidia-smi
Linux strbiogpu02 3.10.0-1062.4.3.el7.x86_64 #1 SMP Wed Nov 13 23:58:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
       total    used    free   shared buff/cache  available
Mem:      187      4     135      0     47     182
Swap:       7      0      7
Thu Jan 26 14:32:26 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01  Driver Version: 440.64.00  CUDA Version: 10.2   |
|-------------------------------+----------------------+----------------------+
| GPU Name    Persistence-M| Bus-Id    Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|     Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|  0 GeForce RTX 208... Off | 00000000:3B:00.0 Off |         N/A |
| 26%  29C  P0  42W / 250W |   0MiB / 11019MiB |   0%   Default |
+-------------------------------+----------------------+----------------------+
|  1 GeForce RTX 208... Off | 00000000:5E:00.0 Off |         N/A |
| 29%  30C  P0  50W / 250W |   0MiB / 11019MiB |   1%   Default |
+-------------------------------+----------------------+----------------------+
|  2 GeForce RTX 208... Off | 00000000:AF:00.0 Off |         N/A |
| 30%  31C  P0  54W / 250W |   0MiB / 11019MiB |   0%   Default |
+-------------------------------+----------------------+----------------------+
|  3 GeForce RTX 208... Off | 00000000:D8:00.0 Off |         N/A |
| 14%  31C  P0  32W / 250W |   0MiB / 11019MiB |   0%   Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                            GPU Memory |
| GPU    PID  Type  Process name               Usage   |
|=============================================================================|
| No running processes found                         |
+-----------------------------------------------------------------------------+

Above is what I got for those commands

Is this a combined a “standalone” (combined master/worker) instance?
Would you consider a more general software update, which would require root access for some tasks?

It is a standalone” (combined master/worker) instance. Would you suggest upgrading to the latest cryosparc and the compatible cuda drivers? If so, what CUDA driver? The current installation was working fine a week ago. Yes we have root access and can make software changes.

For more reliable suggestions, please can you post the output of
ls -l /home/cryosparc_user/cryosparc_software/

[cryosparc_user@strbiogpu02 ~]$ ls -l /home/cryosparc_user/cryosparc_software/
total 16
drwxrwxrwx. 4 cryosparc_user cryosparc_group 8192 Jan 26 18:08 cryosparc2_database
drwxrwxrwx. 16 cryosparc_user cryosparc_group 4096 Jan 24 15:13 cryosparc2_master
drwxrwxrwx. 8 cryosparc_user cryosparc_group 265 Jan 24 15:13 cryosparc2_worker
drwxrwxrwx. 2 cryosparc_user cryosparc_group 6 Jan 16 2020 cryosparc_database

above is the out put of ls -l /home/cryosparc_user/cryosparc_software/

Caution:

  • I do not know the ultimate cause of cuDevicePrimaryCtxRetain in your case
  • Following the suggestion below may disrupt your system more severely than a more surgical repair attempt (on outdated software) would. You must decide whether the actions are appropriate under your circumstances, and assume associated risks

Motivation:

  • The system is not in a functional state
  • Significant updates for the system (security) and CryoSPARC (function) software are available
  • It may ultimately be more beneficial to repair an updated than an outdated configuration

Suggestion:

  • root tasks (all other steps must be performed under the Linux account that “owns” the CryoSPARC instance):
    • patch the operating system with available updates
    • upgrade to the v525 nvidia driver (if available for and compatible with your system)
  • update CryoSPARC
  • after the update, ensure $PATH and $LD_LIBRARY_PATH do not include directories that hold CUDA toolkit executables or libraries (so CryoSPARC does not link inadvertently link to the “wrong” version of the toolkit). The following step may not succeed if $PATH or $LD_LIBRARY_PATH point to your existing 10.2 or any other installation of the CUDA toolkit
  • ensure your configuration meets the requirements, then run
    /home/cryosparc_user/cryosparc_software/cryosparc2_worker/bin/cryosparcw install-3dflex
    
    This step is expected to install the CUDA toolkit
  • test CryoSPARC function and update this forum topic with the outcome