Patch motion correction problem

Hi everyone, I have just installed CryoSparc onto the new machine and tried to process tutorial dataset to check if everything works.
I got this traceback:

Launching job on lane default target k306-MS-7522 ...

Running job on master node hostname k306-MS-7522

[CPU: 80.0 MB] Project P1 Job J15 Started

[CPU: 80.0 MB] Master running v3.3.2, worker running v3.3.2

[CPU: 80.4 MB] Working in directory: /media/k306/01D7BF46BBD6EE40/cryosparc/test/P1/J2

[CPU: 80.4 MB] Running on lane default

[CPU: 80.4 MB] Resources allocated:

[CPU: 80.4 MB] Worker: k306-MS-7522

[CPU: 80.4 MB] CPU : [0, 1, 2, 3, 4, 5]

[CPU: 80.4 MB] GPU : [0]

[CPU: 80.4 MB] RAM : [0, 1]

[CPU: 80.4 MB] SSD : False

[CPU: 80.4 MB] --------------------------------------------------------------

[CPU: 80.4 MB] Importing job module for job type patch_motion_correction_multi...

[CPU: 244.6 MB] Job ready to run

[CPU: 244.6 MB] ***************************************************************

[CPU: 245.2 MB] Job will process this many movies: 20

[CPU: 245.5 MB] parent process is 3842

[CPU: 171.2 MB] Calling CUDA init from 3883

[CPU: 246.0 MB] Outputting partial results now...

[CPU: 246.0 MB] Traceback (most recent call last): File "cryosparc_worker/cryosparc_compute/", line 85, in File "cryosparc_worker/cryosparc_compute/jobs/motioncorrection/", line 402, in AssertionError: Child process with PID 3883 has terminated unexpectedly!

The GPU is GTX1080Ti, CUDA version was 11.7, I found similar topics in the forum reporting problems with recent CUDA version and installed 11.2 alongside with 11.7, then re-installed CryoSPARC, but the problem persists. Thanks for any ideas.

Please can you share the contents of
and provide information on your worker environment:

The log file is:

================= CRYOSPARCW =======  2022-05-31 13:01:11.685063  =========
Project P1 Job J2
Master k306-MS-7522 Port 39002
========= monitor process now starting main process
========= monitor process now waiting for main process
Running job on hostname %s k306-MS-7522
Allocated Resources :  {'fixed': {'SSD': False}, 'hostname': 'k306-MS-7522', 'lane': 'default', 'lane_type': 'default', 'license': True, 'licenses_acquired': 1, 'slots': {'CPU': [0, 1], 'GPU': [0], 'RAM': [0]}, 'target': {'cache_path': '/media/k306/01D7BF46BBD6EE40/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 11720261632, 'name': 'NVIDIA GeForce GTX 1080 Ti'}], 'hostname': 'k306-MS-7522', 'lane': 'default', 'monitor_port': None, 'name': 'k306-MS-7522', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7], 'GPU': [0], 'RAM': [0, 1, 2]}, 'ssh_str': 'k306@k306-MS-7522', 'title': 'Worker node k306-MS-7522', 'type': 'node', 'worker_bin_path': '/home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw'}}
**** handle exception rc
set status to failed
========= main process now complete.
========= monitor process now complete.

The worker is installed on the same machine. Strangely I have

~$eval $(/Home/cryosparc_user/cryosparc/cryosparc_worker/bin/cryosparcw env) 
bash: /Home/cryosparc_user/cryosparc/cryosparc_worker/bin/cryosparcw: No such file or directory
~${CRYOSPARC_CUDA_PATH}/bin/nvcc --version
bash: /bin/nvcc: No such file or directory
~$python -c "import pycuda.driver; print(pycuda.driver.get_version())"

Command 'python' not found, did you mean:

  command 'python3' from deb python3
  command 'python' from deb python-is-python3

Does this mean I have something wrong with Python?

Output of uname -a && free -g && nvidia-smi:

Linux k306-MS-7522 5.13.0-44-generic #49~20.04.1-Ubuntu SMP Wed May 18 18:44:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
              total        used        free      shared  buff/cache   available
Mem:             23           2           0           0          20          20
Swap:             1           0           1
Wed Jun  1 09:53:36 2022       
| NVIDIA-SMI 515.43.04    Driver Version: 515.43.04    CUDA Version: 11.7     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA GeForce 1080Ti  On   | 00000000:02:00.0  On |                  N/A |
| 45%   56C    P8    16W / 250W |    156MiB / 11264MiB |      1%      Default |
|                               |                      |                  N/A |
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|    0   N/A  N/A      1140      G   /usr/lib/xorg/Xorg                 75MiB |
|    0   N/A  N/A      1448      G   /usr/bin/gnome-shell               79MiB |

The capitalization of “Home” and a missing k306 component would explain

Please can you try
eval $(/home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw env)
followed by the other commands instead?
Please can you also confirm that the cryoSPARC installation and processes are “owned” by Linux user “k306”:
ls -l /home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw
ls -l /home/k306/cryosparcuser/cryosparc/cryosparc_master/bin/cryosparcm
ps -ef | grep supervisord

Thanks, indeed the path was incorrect, now got it from pwd command, but this time I have just zero output with
eval $(/home/k306/cryosparc_user/cryosparc/cryosparc_worker/bin/cryosparcw env)

I double checked the file cryosparcw is in place

However now I have

${CRYOSPARC_CUDA_PATH}/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

As for the process ownership, I have:

ls -l /home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw
-rwxrwxr-x 1 k306 k306 13677 apr  8 23:44 /home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw
k306@k306-MS-7522:~/cryosparcuser/cryosparc/cryosparc_master$ ls -l /home/k306/cryosparcuser/cryosparc/cryosparc_master/bin/cryosparcm
-rwxrwxr-x 1 k306 k306 58488 apr  8 23:43 /home/k306/cryosparcuser/cryosparc/cryosparc_master/bin/cryosparcm
k306@k306-MS-7522:~/cryosparcuser/cryosparc/cryosparc_master$ ps -ef | grep supervisord
k306       59893    1015  0 17:09 ?        00:00:00 python /home/k306/cryosparcuser/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /home/k306/cryosparcuser/cryosparc/cryosparc_master/supervisord.conf
k306       60309   59763  0 17:15 pts/0    00:00:00 grep --color=auto supervisord

The execution of cryosparcw env
return this:

export "CRYOSPARC_USE_GPU=true"
export "CRYOSPARC_PATH=/home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin"
export "CRYOSPARC_ROOT_DIR=/home/k306/cryosparcuser/cryosparc/cryosparc_worker"
export "CRYOSPARC_CUDA_PATH=/usr/local/cuda"
export "CRYOSPARC_DEVELOP=false"
export "CRYOSPARC_CONDA_ENV=cryosparc_worker_env"
export "PATH=/home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/condabin:/usr/local/cuda/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/condabin:/usr/local/cuda/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/condabin:/usr/local/cuda/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_master/bin:/home/k306/cryosparcuser/cryosparc/cryosparc_master/bin:/home/k306/soft/PyMOL-2.5.2_293-Linux-x86_64-py37/pymol:/usr/local/gromacs/bin/GMXRC:/home/k306/soft/NAMD_3.0alpha11_Linux-x86_64-multicore-CUDA:/home/k306/soft/vmd-1.9.3/bin/vmd:/home/k306/soft/ccpem-1.6.0-linux-x86_64/ccpem-1.6.0/"
export "LD_LIBRARY_PATH=/home/k306/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/libs:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/libs:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/blobio:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/cryosparc_compute/libs:/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda/lib64"
export "LD_PRELOAD=/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/"
export "PYTHONPATH=/home/k306/cryosparcuser/cryosparc/cryosparc_worker"
export "CONDA_EXE=/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/bin/conda"
export "CONDA_PREFIX=/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env"
export "CONDA_PROMPT_MODIFIER=(cryosparc_worker_env)"
export "CONDA_SHLVL=1"
export "CONDA_PYTHON_EXE=/home/k306/cryosparcuser/cryosparc/cryosparc_worker/deps/anaconda/bin/python"
export "CONDA_DEFAULT_ENV=cryosparc_worker_env"

No output is expected when that command is successful, which should load the cryoSPARC worker environment into the current shell.
I am surprised about the finding of

Without any changes to the CUDA configuration, please can you run

eval $(/home/k306/cryosparc_user/cryosparc/cryosparc_worker/bin/cryosparcw env)
python -c "import pycuda.driver; print(pycuda.driver.get_version())"

and report the output of the second command.

The output is:
(11, 7, 0)

This indicates a version mismatch with

I noticed that you

I do not know the path of that CUDA-11.2 installation, but let’s assume it is /usr/local/cuda-11.2. In that case, I would suggest (substitute actual path to cuda-11.2 on your computer for steps 1 and 2):

  1. edit the line in cryosparc_worker/ that begins with
    export CRYOSPARC_CUDA_PATH= so that it becomes
    export CRYOSPARC_CUDA_PATH=/usr/local/cuda-11.2
  2. “register” cuda-11.2 with the cryoSPARC worker (one-line command):
    /home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw newcuda /usr/local/cuda-11.2
  3. attempt to run a clone of the Patch Motion Correction job

I did all these commands, pycuda version is (11, 2, 0) now, but the motion correction still not running with the same mistakes

At this point, I suggest the following:

  1. Please install the latest patch (patching documentation):
    cryosparcm patch
    The patch is expected not to resolve the issue, but to result in more meaningful error messages.
  2. Again capture cuda configuration (two lines of commands)
eval $(/home/k306/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw env)
(date && $CRYOSPARC_CUDA_PATH/bin/nvcc --version && python -c "import pycuda.driver; print(pycuda.driver.get_
version())") | tee /tmp/cuda_info.txt
  1. Run a clone of the failed Patch_Motion_Correction job.
  2. Paste error messages from the Overview tab of that (presumably failing) job into a response under this forum topic
  3. I will send you a direct message about where you may send /tmp/cuda_info.txt as well as (from the new job’s directory) job.json and job.log.

Thanks, the patch seemingly resolved the issue! The job was completed. However I had an error message, thus anyway here is run log:
…[CPU: 170.6 MB] Calling CUDA init from 21362

[CPU: 246.2 MB] Child process with PID 21362 terminated unexpectedly with exit code 1.

[CPU: 246.2 MB] --------------------------------------------------------------

[CPU: 246.2 MB] Compiling job outputs…

[CPU: 246.2 MB] Passing through outputs for output group micrographs from input group movies

[CPU: 246.2 MB] This job outputted results [‘micrograph_blob_non_dw’, ‘micrograph_thumbnail_blob_1x’, ‘micrograph_thumbnail_blob_2x’, ‘micrograph_blob’, ‘background_blob’, ‘rigid_motion’, ‘spline_motion’]

[CPU: 246.2 MB] Loaded output dset with 0 items

[CPU: 246.2 MB] Passthrough results [‘movie_blob’, ‘gain_ref_blob’, ‘mscope_params’]

[CPU: 246.2 MB] Loaded passthrough dset with 20 items

[CPU: 246.2 MB] Intersection of output and passthrough has 0 items

[CPU: 246.2 MB] Passing through outputs for output group micrographs_incomplete from input group movies

[CPU: 246.2 MB] This job outputted results [‘micrograph_blob’]

[CPU: 246.2 MB] Loaded output dset with 20 items

[CPU: 246.2 MB] Passthrough results [‘movie_blob’, ‘gain_ref_blob’, ‘mscope_params’]

[CPU: 246.2 MB] Loaded passthrough dset with 20 items

[CPU: 246.8 MB] Intersection of output and passthrough has 20 items

[CPU: 246.8 MB] Checking outputs for output group micrographs

[CPU: 246.8 MB] Checking outputs for output group micrographs_incomplete

[CPU: 246.8 MB] Updating job size…

[CPU: 246.8 MB] Exporting job and creating csg files…

[CPU: 246.9 MB] ***************************************************************
[CPU: 246.9 MB] Job complete. Total time 30.43s

Current CUDA configuration is:
Tue 07 Jun 2022 09:40:42 CET
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

Actually no images were processed despite the job was completed. Still something wrong with CUDA

The job.log file you sent us in includes a 'CUDA driver library not found' error. Please check if the scenario (and its resolution) described by another user under this related topic applies in your case.


Thanks, indeed the problem was with this library! In my case I found the library in the …/cuda-11.2/lib64/stubs/ and just copied it in the parent directory …/cuda-11.2/lib64/

Copying the library from stubs/ is discouraged according to this source. Have you confirmed that patch motion correction and follow-up processing steps now run correctly?

I only tried the patch motion correction, it seems to run correctly. I also have another in …/cuda-11.2/targets/x86_64-linux/lib/stubs/ like that described in related topic Possibly the right way is to create a soft link to this file?

I am not sure. Some users have reported success with this strategy, another source appears to discourage it.
My recommendation (as of December 2022, my views are still evolving):

  1. Do not copy or otherwise link to files inside the stubs directory.
  2. Perform installation of the cuda toolkit and other cuda-related installation steps on a worker with the nvidia drivers installed. This should ensure the availability of a non-stub On ubuntu-22.04 (updated), for example, /usr/lib/x86_64-linux-gnu/ is included in the libnvidia-compute package, on which the nvidia-driver package depends. In case the software installation is to be shared between workers that have different versions of the nvidia driver installed, I would try to perform installation steps on the worker with the oldest version of the drivers, but I have not thoroughly tested this scenario.