Error calling Cuda init

I am in IT helping a lab get CryoSparc running. I have just installed v4.4.1 and understand that I do not need to load any cuda modules because the latest version already has it built in.

I am getting an error after calling CUDA init. Any help would be greatly appreciated.

Thanks, Joe Koral

[CPU: 177.9 MB Avail: 250.03 GB]

Master running v4.4.1, worker running v4.4.1
[CPU: 178.1 MB Avail: 250.03 GB]

Working in directory: /data/efink1/cryoEM/4OHT/CS-wt-4oht-1/J8
[CPU: 178.1 MB Avail: 250.03 GB]

Running on lane default
[CPU: 178.1 MB Avail: 250.03 GB]

Resources allocated:
[CPU: 178.1 MB Avail: 250.03 GB]

Worker: node11
[CPU: 178.1 MB Avail: 250.03 GB]

CPU : [0, 1, 2, 3, 4, 5]
[CPU: 178.1 MB Avail: 250.03 GB]

GPU : [0]
[CPU: 178.1 MB Avail: 250.03 GB]

RAM : [0, 1]
[CPU: 178.1 MB Avail: 250.03 GB]

SSD : False
[CPU: 178.1 MB Avail: 250.03 GB]


[CPU: 178.1 MB Avail: 250.03 GB]

Importing job module for job type patch_motion_correction_multi…
[CPU: 205.1 MB Avail: 250.01 GB]

Job ready to run
[CPU: 205.2 MB Avail: 250.01 GB]


[CPU: 208.9 MB Avail: 250.00 GB]

Job will process this many movies: 4854
[CPU: 208.9 MB Avail: 250.00 GB]

Random seed: 473663310
[CPU: 208.9 MB Avail: 250.00 GB]

parent process is 62845
[CPU: 182.1 MB Avail: 249.97 GB]

Calling CUDA init from 62892
[CPU: 228.2 MB Avail: 249.98 GB]

Child process with PID 62892 terminated unexpectedly with exit code 1.
[CPU: 224.2 MB Avail: 249.98 GB]


[CPU: 224.2 MB Avail: 249.98 GB]

Compiling job outputs…
[CPU: 224.2 MB Avail: 249.98 GB]

Passing through outputs for output group micrographs from input group movies
[CPU: 224.2 MB Avail: 249.98 GB]

This job outputted results [‘micrograph_blob_non_dw’, ‘micrograph_thumbnail_blob_1x’, ‘micrograph_thumbnail_blob_2x’, ‘micrograph_blob’, ‘background_blob’, ‘rigid_motion’, ‘spline_motion’]
[CPU: 224.2 MB Avail: 249.98 GB]

Loaded output dset with 0 items
[CPU: 224.2 MB Avail: 249.98 GB]

Passthrough results [‘movie_blob’, ‘gain_ref_blob’, ‘mscope_params’]
[CPU: 224.2 MB Avail: 249.98 GB]

Loaded passthrough dset with 4854 items
[CPU: 224.2 MB Avail: 249.98 GB]

Intersection of output and passthrough has 0 items
[CPU: 224.3 MB Avail: 249.98 GB]

Passing through outputs for output group micrographs_incomplete from input group movies
[CPU: 224.3 MB Avail: 249.98 GB]

This job outputted results [‘micrograph_blob’]
[CPU: 224.3 MB Avail: 249.98 GB]

Loaded output dset with 4854 items
[CPU: 224.3 MB Avail: 249.98 GB]

Passthrough results [‘movie_blob’, ‘gain_ref_blob’, ‘mscope_params’]
[CPU: 224.3 MB Avail: 249.98 GB]

Loaded passthrough dset with 4854 items
[CPU: 224.3 MB Avail: 249.98 GB]

Intersection of output and passthrough has 4854 items
[CPU: 224.1 MB Avail: 249.98 GB]

Checking outputs for output group micrographs
[CPU: 224.1 MB Avail: 249.98 GB]

Checking outputs for output group micrographs_incomplete
[CPU: 224.1 MB Avail: 249.98 GB]

Updating job size…
[CPU: 224.1 MB Avail: 249.98 GB]

Exporting job and creating csg files…
[CPU: 224.1 MB Avail: 249.98 GB]


[CPU: 224.1 MB Avail: 249.98 GB]

Job complete. Total time 30.97s

I noticed getting this too. Does anyone know where he CUDA driver shared library is located?

CUDA driver library cannot be found.
If you are sure that a CUDA driver is installed,
try setting environment variable NUMBA_CUDA_DRIVER
with the file path of the CUDA driver shared library.

Welcome to the forum @jkoral .
What is the output of the command

nvidia-smi

on the computer where the patch motion correction job was running?

I think I may have resolved that issue by loading the CUDA module in .bashrc, but I was still under the impression that with version 4.4.1, that I did not need to do that.

I am getting other errors after fixing that one. Do you think the below error is related to CUDA?

[CPU: 228.1 MB Avail: 249.41 GB]

Error occurred while processing J5/imported/009591340203159439291_FoilHole_14752745_Data_14750840_14750842_20240315_134617_fractions.tiff
Traceback (most recent call last):
File “/home/csuser/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py”, line 61, in exec
return self.process(item)
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 192, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 195, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 224, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py”, line 201, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py”, line 292, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py”, line 301, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
File “cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py”, line 132, in cryosparc_master.cryosparc_compute.jobs.motioncorrection.patchmotion.prepare_movie_for_processing
AssertionError: In Variable Dose mode, cannot skip start or end frames

Marking J5/imported/009591340203159439291_FoilHole_14752745_Data_14750840_14750842_20240315_134617_fractions.tiff as incomplete and continuing…

Here is the nvidia-smi output just in case you still need it:
[csuser@node11 ~]$ nvidia-smi
Tue Apr 9 17:48:29 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A4500 On | 00000000:B1:00.0 Off | Off |
| 30% 28C P8 17W / 200W| 682MiB / 20470MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA RTX A4500 On | 00000000:CA:00.0 Off | Off |
| 30% 28C P8 12W / 200W| 500MiB / 20470MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 42341 C /home/user 498MiB |
| 0 N/A N/A 70307 C python 182MiB |
| 1 N/A N/A 42341 C /home/user 498MiB |
±--------------------------------------------------------------------------------------+

This would imply that Variable Dose was enabled. Variable Dose is not usually required, unless the data was acquired in a special way. Please post the parameters used for Import and Patch Motion Correction jobs. Or, disable Variable Dose and test again.

Thank you so much for your quick replies. Since I am not the researcher, I will have them disable the Variable Dose and let me know how it goes.

Since my initial question has been resolved, I will consider this issue closed.

Thanks,
Joe

I confirm that loading of a CUDA module in ~/.bashrc should not be required in v4.4.1 (and may interfere with CryoSPARC function).
Please can you post CUDA-related error messages you see (without loading and external CUDA module) when using CryoSPARC. Please include:

  1. the CUDA-related error message itself
  2. additional lines preceding the error message for context
  3. where the error message was encountered (such as which section of the UI, path of the relevant log file)
  4. outputs of the commands
    /home/csuser/cryosparc/cryosparc_worker/bin/cryosparcw env | grep PATH
    /home/csuser/cryosparc/cryosparc_worker/bin/cryosparcw gpulist