Problem with Patch Motion Corr when running T20S tutorial

Hello,

I am brand new to Cryosparc and still a little too new to Linux and I am having a problem running the T20S tutorial, specifically I am having issues running the Patch motion correction portion of the job.

Environment:

Current cryoSPARC version: v2.15.0

Cuda version export CRYOSPARC_CUDA_PATH="/usr/local/cuda"

CentOS 7

Standalone install

Issue:

Whenever I try and run patch motion correction on the imported movies from the tutorial, I get an error. I think I might have tracked down the problem, as I looked on the Cryosparc troubleshooting guide and I followed the directions for my error, namely the section titled “Job runs but ends unexpectedly with status ‘Failed.’”

Originally, I believed my error was due to the fact our workstations were using CUDA version 11, but I did install CUDA 10.2, which should be compatible. However, when I input the command bin/cryosparcw from the cyrosparc2_worker folder, I get the following message output:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "connect.py", line 22, in print_gpu_list
    import pycuda.driver as cudrv

This error led me to a following discussion here in your website (the limit on new users linking discussions has prevented me from copying it here for some reason)

I found there the following set of command to try and reinstall pycuda

cd cryosparc2_worker
cat config.sh
#ensure the `CRYOSPARC_CUDA_PATH` var is set to the parent directory of `/bin/nvcc`
#i.e. `/usr/local/cuda-10.1` 
eval $(./bin/cryosparcw env)
which pip 
#should be inside the cryosparc2_worker dir, 
#which means it's part of cryoSPARC's python environment.
pip uninstall pycuda
pip install "./deps_bundle/python/python_packages/pip_packages/pycuda-2018.1.1.tar.gz" --no-cache-dir

Everything seemed to work fine until I attempted the pip install command, and when I issued it gave me the following error

*** WARNING: nvcc not in path.
*** May need to set CUDA_INC_DIR for installation to succeed.
***************************************************************
*************************************************************
*** I have detected that you have not run configure.py.
*************************************************************
*** Additionally, no global config files were found.
*** I will go ahead with the default configuration.
*** In all likelihood, this will not work out.
***
*** See README_SETUP.txt for more information.
***
*** If the build does fail, just re-run configure.py with the
*** correct arguments, and then retry. Good luck!
*************************************************************
*** HIT Ctrl-C NOW IF THIS IS NOT WHAT YOU WANT
*************************************************************

src/cpp/cuda.hpp:14:18: fatal error: cuda.h: No such file or directory
 #include <cuda.h>
                  ^
compilation terminated.
error: command 'gcc' failed with exit status 1

----------------------------------------

Below is the full log of the actual error I am receiving from the cryosparc joblog Pxx Jyy

================= CRYOSPARCW =======  2020-10-08 10:32:01.510737  =========
Project P1 Job J12
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 49491
========= monitor process now waiting for main process
MAIN PID 49491
motioncorrection.run_patch cryosparc2_compute.jobs.jobregister
***************************************************************
Process Process-1:1:
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 155, in process_work_simple
    process_setup(proc_idx) # do any setup you want on a per-process basis
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 80, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.process_setup
  File "cryosparc2_compute/engine/__init__.py", line 8, in <module>
    from engine import *
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 4, in init cryosparc2_compute.engine.engine
ImportError: No module named pycuda.driver
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 359, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 49526 has terminated unexpectedly!
========= main process now complete.
========= monitor process now complete.

I have a feeling there is something very basic I am missing here, I am just not sure what it might be. I hope this information is helpful, please let me know if anymore would be as well. Thank you all ahead of time for your help!

Joe

Hi Joe, thanks for posting such detailed information!

The issue here looks like pip couldn’t re-install pycuda because some required files from your local CUDA installation or are not accessible.

Inside the cryosparc2_worker folder, open the config.sh file and look for a line that looks like this:

export CRYOSPARC_CUDA_PATH="/usr/local/cuda-10.2"

Check the location at the given path (e.g., /usr/local/cuda-10.2) ensure that location exists on your system. Go inside that location and check that it has subfolders called /bin, /lib64 and /include. If any of these are missing, re-install CUDA. Edit config.sh to specify the correct value for export CRYOSPARC_CUDA_PATH="...".

Next, in the command line, enter the following (again substituting the correct path for the CUDA_PATH line).

export CUDA_PATH="/usr/local/cuda-10.2"
export C_INCLUDE_PATH="$CUDA_PATH/include"
export CPLUS_INCLUDE_PATH="$CUDA_PATH/include"

Re-run the re-installation steps you used for pycuda and try running a job.

If you see any errors in the commands you run, send them over to me along with the output of the command nvidia-smi.

Let me know how that goes!

Nick

Thank you so much for all of your help Nick and for the very helpful steps you gave me to fix my problem. All the subfolders you mentioned were present and I went ahead. Unfortunately, I did get an error, I will post it below:

[cryosparc@XXX-XXXXXXX cryosparc2_worker]$ pip install "./deps_bundle/python/python_packages/pip_packages/pycuda-2019.1.tar.gz" --no-cache-dir
Processing ./deps_bundle/python/python_packages/pip_packages/pycuda-2019.1.tar.gz
Requirement already satisfied: pytools>=2011.2 in ./deps/anaconda/lib/python2.7/site-packages (from pycuda==2019.1)
Requirement already satisfied: pytest>=2 in ./deps/anaconda/lib/python2.7/site-packages (from pycuda==2019.1)
Requirement already satisfied: decorator>=3.2.0 in ./deps/anaconda/lib/python2.7/site-packages (from pycuda==2019.1)
Requirement already satisfied: appdirs>=1.4.0 in ./deps/anaconda/lib/python2.7/site-packages (from pycuda==2019.1)
Requirement already satisfied: mako in ./deps/anaconda/lib/python2.7/site-packages/Mako-1.0.7-py2.7.egg (from pycuda==2019.1)
Requirement already satisfied: six>=1.8.0 in ./deps/anaconda/lib/python2.7/site-packages (from pytools>=2011.2->pycuda==2019.1)
Requirement already satisfied: numpy>=1.6.0 in ./deps/anaconda/lib/python2.7/site-packages (from pytools>=2011.2->pycuda==2019.1)
Requirement already satisfied: py>=1.5.0 in ./deps/anaconda/lib/python2.7/site-packages (from pytest>=2->pycuda==2019.1)
Requirement already satisfied: setuptools in ./deps/anaconda/lib/python2.7/site-packages (from pytest>=2->pycuda==2019.1)
Requirement already satisfied: attrs>=17.2.0 in ./deps/anaconda/lib/python2.7/site-packages (from pytest>=2->pycuda==2019.1)
Requirement already satisfied: pluggy<0.7,>=0.5 in ./deps/anaconda/lib/python2.7/site-packages (from pytest>=2->pycuda==2019.1)
Requirement already satisfied: funcsigs in ./deps/anaconda/lib/python2.7/site-packages (from pytest>=2->pycuda==2019.1)
Requirement already satisfied: MarkupSafe>=0.9.2 in ./deps/anaconda/lib/python2.7/site-packages (from mako->pycuda==2019.1)
Installing collected packages: pycuda
  Running setup.py install for pycuda ... error
    Complete output from command /home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-vjUdmB-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-rfchYy-record/install-record.txt --single-version-externally-managed --compile:
    ***************************************************************
    *** WARNING: nvcc not in path.
    *** May need to set CUDA_INC_DIR for installation to succeed.
    ***************************************************************
    *************************************************************
    *** I have detected that you have not run configure.py.
    *************************************************************
    *** Additionally, no global config files were found.
    *** I will go ahead with the default configuration.
    *** In all likelihood, this will not work out.
    ***
    *** See README_SETUP.txt for more information.
    ***
    *** If the build does fail, just re-run configure.py with the
    *** correct arguments, and then retry. Good luck!
    *************************************************************
    *** HIT Ctrl-C NOW IF THIS IS NOT WHAT YOU WANT
    *************************************************************
    Continuing in 1 seconds...    
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-2.7
    creating build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/__init__.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/_cluda.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/_mymako.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/autoinit.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/characterize.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/compiler.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/cumath.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/curandom.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/debug.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/driver.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/elementwise.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/gpuarray.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/reduction.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/scan.py -> build/lib.linux-x86_64-2.7/pycuda
    copying pycuda/tools.py -> build/lib.linux-x86_64-2.7/pycuda
    creating build/lib.linux-x86_64-2.7/pycuda/gl
    copying pycuda/gl/__init__.py -> build/lib.linux-x86_64-2.7/pycuda/gl
    copying pycuda/gl/autoinit.py -> build/lib.linux-x86_64-2.7/pycuda/gl
    creating build/lib.linux-x86_64-2.7/pycuda/sparse
    copying pycuda/sparse/__init__.py -> build/lib.linux-x86_64-2.7/pycuda/sparse
    copying pycuda/sparse/cg.py -> build/lib.linux-x86_64-2.7/pycuda/sparse
    copying pycuda/sparse/coordinate.py -> build/lib.linux-x86_64-2.7/pycuda/sparse
    copying pycuda/sparse/inner.py -> build/lib.linux-x86_64-2.7/pycuda/sparse
    copying pycuda/sparse/operator.py -> build/lib.linux-x86_64-2.7/pycuda/sparse
    copying pycuda/sparse/packeted.py -> build/lib.linux-x86_64-2.7/pycuda/sparse
    copying pycuda/sparse/pkt_build.py -> build/lib.linux-x86_64-2.7/pycuda/sparse
    creating build/lib.linux-x86_64-2.7/pycuda/compyte
    copying pycuda/compyte/__init__.py -> build/lib.linux-x86_64-2.7/pycuda/compyte
    copying pycuda/compyte/array.py -> build/lib.linux-x86_64-2.7/pycuda/compyte
    copying pycuda/compyte/dtypes.py -> build/lib.linux-x86_64-2.7/pycuda/compyte
    running egg_info
    writing requirements to pycuda.egg-info/requires.txt
    writing pycuda.egg-info/PKG-INFO
    writing top-level names to pycuda.egg-info/top_level.txt
    writing dependency_links to pycuda.egg-info/dependency_links.txt
    reading manifest file 'pycuda.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    warning: no files found matching 'doc/source/_static/*.css'
    warning: no files found matching 'doc/source/_templates/*.html'
    warning: no files found matching '*.cpp' under directory 'bpl-subset/bpl_subset/boost'
    warning: no files found matching '*.html' under directory 'bpl-subset/bpl_subset/boost'
    warning: no files found matching '*.inl' under directory 'bpl-subset/bpl_subset/boost'
    warning: no files found matching '*.txt' under directory 'bpl-subset/bpl_subset/boost'
    warning: no files found matching '*.h' under directory 'bpl-subset/bpl_subset/libs'
    warning: no files found matching '*.ipp' under directory 'bpl-subset/bpl_subset/libs'
    warning: no files found matching '*.pl' under directory 'bpl-subset/bpl_subset/libs'
    writing manifest file 'pycuda.egg-info/SOURCES.txt'
    creating build/lib.linux-x86_64-2.7/pycuda/cuda
    copying pycuda/cuda/pycuda-complex-impl.hpp -> build/lib.linux-x86_64-2.7/pycuda/cuda
    copying pycuda/cuda/pycuda-complex.hpp -> build/lib.linux-x86_64-2.7/pycuda/cuda
    copying pycuda/cuda/pycuda-helpers.hpp -> build/lib.linux-x86_64-2.7/pycuda/cuda
    copying pycuda/sparse/pkt_build_cython.pyx -> build/lib.linux-x86_64-2.7/pycuda/sparse
    running build_ext
    building '_driver' extension
    creating build/temp.linux-x86_64-2.7
    creating build/temp.linux-x86_64-2.7/src
    creating build/temp.linux-x86_64-2.7/src/cpp
    creating build/temp.linux-x86_64-2.7/src/wrapper
    creating build/temp.linux-x86_64-2.7/bpl-subset
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/python
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/converter
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/python/src/object
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/smart_ptr
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/smart_ptr/src
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/system
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/system/src
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/thread
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/thread/src
    creating build/temp.linux-x86_64-2.7/bpl-subset/bpl_subset/libs/thread/src/pthread
    gcc -pthread -B /home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/compiler_compat -Wl,--sysroot=/ -fno-strict-aliasing -fwrapv -Wall -O3 -DNDEBUG -fPIC -DBOOST_PYTHON_SOURCE=1 -DHAVE_CURAND=1 -DPYGPU_PACKAGE=pycuda -DBOOST_THREAD_DONT_USE_CHRONO=1 -DPYGPU_PYCUDA=1 -DBOOST_MULTI_INDEX_DISABLE_SERIALIZATION=1 -DBOOST_THREAD_BUILD_DLL=1 -Dboost=pycudaboost -DBOOST_ALL_NO_LIB=1 -Isrc/cpp -Ibpl-subset/bpl_subset -I/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/numpy/core/include -I/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/include/python2.7 -c src/cpp/cuda.cpp -o build/temp.linux-x86_64-2.7/src/cpp/cuda.o
    cc1plus: error: /usr/local/cuda-10.2/include: Permission denied
    error: command 'gcc' failed with exit status 1
    
    ----------------------------------------
Command "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-vjUdmB-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-rfchYy-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-vjUdmB-build/
You are using pip version 9.0.1, however version 20.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

I have a feeling just from reading the error that this is some sort of a permissions issue with the Cryosparc user. I don’t know if something as simple as changing permissions within the cuda folder would work. Or perhaps I should try and do the cuda install as the root user? Please let me know what you think when you can.

Below is the output of nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:19:00.0 Off |                  N/A |
| 31%   32C    P8     1W / 250W |      1MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:1A:00.0 Off |                  N/A |
| 32%   34C    P8     4W / 250W |      1MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  Off  | 00000000:67:00.0 Off |                  N/A |
| 31%   34C    P8     3W / 250W |      1MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  Off  | 00000000:68:00.0 Off |                  N/A |
| 31%   34C    P8     1W / 250W |     62MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    3     53538      G   /usr/bin/X                                    39MiB |
|    3     53617      G   /usr/bin/gnome-shell                          20MiB |
+-----------------------------------------------------------------------------+

Thank you again for the advice Nick! It was very helpful and I think I am starting to understand the problem. Let me know what you think I could do next. Thank you so much!

Joe

Hmm yeah that permission error does seem suspicious. Can you send the output of these commands?

ls -la /usr/local
ls -la /usr/local/cuda-10.2/
ls -la /usr/local/cuda-10.2/include

Sure thing, I will copy it below:

[cryosparc@XXX-XXXXXXX ~]$ ls -la /usr/local
total 0
drwxr-xr-x. 15 root root 194 Oct 1 12:48 .
drwxr-xr-x. 14 root root 166 Feb 25 2020 …
drwxr-xr-x. 3 root root 214 Oct 8 09:22 bin
lrwxrwxrwx 1 root root 21 Oct 1 12:48 cuda -> /usr/local/cuda-10.2/
drwxr-x— 18 root root 327 Oct 1 12:49 cuda-10.2
drwxr-xr-x 15 root root 305 Jul 23 13:48 cuda-11.0
drwxr-xr-x 14 root root 309 Sep 20 16:08 cuda-11.1
drwxr-xr-x. 2 root root 6 Apr 11 2018 etc
drwxr-xr-x. 2 root root 6 Apr 11 2018 games
drwxr-xr-x. 2 root root 6 Apr 11 2018 include
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib
drwxr-xr-x. 2 root root 6 Apr 11 2018 lib64
drwxr-xr-x. 2 root root 6 Apr 11 2018 libexec
drwxr-xr-x. 2 root root 6 Apr 11 2018 sbin
drwxr-xr-x. 5 root root 49 Feb 24 2020 share
drwxr-xr-x. 2 root root 6 Apr 11 2018 src
[cryosparc@XXX-XXXXXXX ~]$ ls -la /usr/local/cuda-10.2/
ls: cannot open directory /usr/local/cuda-10.2/: Permission denied
[cryosparc@XXX-XXXXXXX ~]$ ls -la /usr/local/cuda-10.2/include
ls: cannot access /usr/local/cuda-10.2/include: Permission denied
[cryosparc@XXX-XXXXXXX ~]$

Definitely appears I get output for the usr/local list, but not the cuda and include subdirectories. Thanks for the help Nick, let me know your thoughts when you can!

Joe

It looks like the non-root users don’t have read or execute permissions on the cuda-10.2 directory. Assuming you have administrator access, you can fix this with this command:

sudo chmod -R o+rx /usr/local/cuda-10.2

(enter your adminstrator password when prompted)

Then try listing the directories again. If the cuda-10.2 and cuda-10.2/include directories have something in them, try the pycuda reinstall steps again. Let me know if you get any more errors.

Edit: Added note to check directory listings.

Hello again Nick, thank you for all of your help. It appears by changing the permissions as you suggested I have cleared up that original permissions error and the pycuda install went fine. However, I am getting a new error now for what is still the same tutorial path motion correct job. I will paste the full error from the cryosparcm joblog command below:

================= CRYOSPARCW =======  2020-10-15 14:01:12.431911  =========
Project P1 Job J23
Master XXX-XXXXX.XXXXXXX.XXX Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 7998
========= monitor process now waiting for main process
MAIN PID 7998
motioncorrection.run_patch cryosparc2_compute.jobs.jobregister
***************************************************************
Running job on hostname %s XXX-XXXXXXX.XXXXXXXX.XXX
Allocated Resources :  {u'lane': u'default', u'target': {u'monitor_port': None, u'lane': u'default', u'name': u'XXX', u'title': u'Worker node fwl-c139783.ncifcrf.gov', u'resource_slots': {u'GPU': [0, 1, 2, 3], u'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], u'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]}, u'hostname': u'XXX-XXXXXX-XXXXXXX-XXX', u'worker_bin_path': u'/home/cryosparc/software/cryosparc/cryosparc2_worker/bin/cryosparcw', u'cache_path': None, u'cache_quota_mb': None, u'resource_fixed': {u'SSD': False}, u'gpus': [{u'mem': 11554717696, u'id': 0, u'name': u'GeForce RTX 2080 Ti'}, {u'mem': 11554717696, u'id': 1, u'name': u'GeForce RTX 2080 Ti'}, {u'mem': 11554717696, u'id': 2, u'name': u'GeForce RTX 2080 Ti'}, {u'mem': 11554324480, u'id': 3, u'name': u'GeForce RTX 2080 Ti'}], u'cache_reserve_mb': 10000, u'type': u'node', u'ssh_str': u'darlingje@XXX', u'desc': None}, u'license': True, u'hostname': u'XXX', u'slots': {u'GPU': [0], u'RAM': [0, 1], u'CPU': [0, 1, 2, 3, 4, 5]}, u'fixed': {u'SSD': False}, u'lane_type': u'default', u'licenses_acquired': 1}
/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')
Process Process-1:1:
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
    self.run()
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 155, in process_work_simple
    process_setup(proc_idx) # do any setup you want on a per-process basis
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 80, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.process_setup
  File "cryosparc2_compute/engine/__init__.py", line 8, in <module>
    from engine import *
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 12, in init cryosparc2_compute.engine.engine
  File "cryosparc2_worker/cryosparc2_compute/engine/gfourier.py", line 6, in init cryosparc2_compute.engine.gfourier
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 20, in <module>
    from . import misc
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/misc.py", line 25, in <module>
    from . import cublas
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py", line 292, in <module>
    _cublas_version = int(_get_cublas_version())
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py", line 285, in _get_cublas_version
    h = cublasCreate()
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py", line 203, in cublasCreate
    cublasCheckStatus(status)
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py", line 179, in cublasCheckStatus
    raise e
cublasNotInitialized
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/home/cryosparc/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 359, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 8032 has terminated unexpectedly!
========= main process now complete.
========= monitor process now complete.

As I said, the pycuda install seemed to go well and the cryosparcw gpulist command seems to work just fine now. Here is the output from that command now

bin/cryosparcw gpulist
  Detected 4 CUDA devices.

   id           pci-bus  name
   ---------------------------------------------------------------
       0      0000:19:00.0  GeForce RTX 2080 Ti
       1      0000:1A:00.0  GeForce RTX 2080 Ti
       2      0000:67:00.0  GeForce RTX 2080 Ti
       3      0000:68:00.0  GeForce RTX 2080 Ti
   ---------------------------------------------------------------

Nothing in the error log I am getting now for the job jumped out to me as the key to the problem or helped me find the solution on the site like before, so unfortunately I am stuck again. Thank you so much for your help so far Nick, looking forward to hearing back. Thanks!

Joe

Hi Joe, glad that you were able to pycuda installed!

For this new error, we typically see this when there’s some kind of GPU misconfiguration issue such as a non-Default compute mode (e.g., LogicError: cuCtxCreate failed: invalid device ordinal). Though that specific example doesn’t seem to be the case here given the nvidia-smi output you posted previously.

Let me check with my colleagues and get back to you about this.

Okay, there are two cases in which we’ve seen this:

  1. Following a system update where the Nvidia drivers or libraries were updated but not restarted. Restarting your machine should fix this, try this first.
  2. Starting in CUDA 10.1, Nvidia changed how some components of CUDA were stored on disk. Sometimes, this prevented cryoSPARC from finding those necessary components.

To fix #2, try my colleague’s instructions here: Patch CTF Job Fails: Child process with PID 31697 has terminated unexpectedly

Let me know how that goes.