Troubleshooting: T20S extensive workflow patch motion correction failure

Hi,

I have been trying to test my cryosparc install with the extensive workflow for T20S. The import worked fine, however the patch motion is giving me the following error:

[CPU: 197.5 MB]  Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
  File "cryosparc2_master/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 363, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 9312 has terminated unexpectedly!

I have followed previous threds with similar errors that seem to have been fixed through updating to more recent versions of cryosparc or with patches but I have these all up to date (Current version: v2.15.0+200728) but I am still getting this error.

For some additional information, I installed following quick installation instructions for a single workstation. The workstation has an AMD CPU with 3x2080Ti. I have recently installed CUDA 10.0 in addition to CUDA 8.0 but have carefully installed cryosparc with the cuda path to usr/local/cuda-10.0 so this shouldn’t be an issue. I did accidentally at one point install cuda 11 as some instructions on the nvidia website were a little unclear but as far as I’m aware I managed to purge this and autoremoved other dependencies so I don’t think this will be the issue.

Another error which may or may not be related occurred after I tried to run a 2D classification of some imported particles:

ImportError: libcurand.so.8.0: cannot open shared object file: No such file or directory

Although this seems CUDA toolkit related as I have libcurand.so.10.0 as part of cuda-10.0 so this is likely to be cuda related but not sure how to fix this either. I’m stumped and can’t get my install to work, help!

Hi @Lucy, assuming you have complete CUDA 10 installation, you may just have to re-install the CUDA-specific dependencies in the cryosparc2_worker folder. Here’s how you do that:

  1. Navigate to where you installed the cryosparc2_worker via command line
    cd /path/to/cryosparc2_worker
    
  2. Enter the following variables, changing the CUDA_PATH with the correct path (if it differs):
    export CUDA_PATH="/usr/local/cuda-10.0"
    export CUDA_INC_DIR="$CUDA_PATH/include"
    export C_INCLUDE_PATH="$CUDA_INC_DIR"
    export CPLUS_INCLUDE_PATH="$CUDA_INC_DIR"
    
  3. Re-run install.sh with your license ID:
    bash ./install.sh --license "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" \
        --cudapath $CUDA_PATH
    

If you see any errors, please send over the output. If you see no errors, but the Patch Motion job still doesn’t work, I suggest you reinstall CUDA and retry the instructions above.

Let me know how that goes,

Nick

Hi Nick,

Thank you for your repsonse and sorry for the delay, shortly after following your advice Ubuntu completely crashed! I suspect it was something to do with CUDA so I deleted CUDA10 and managed to restore Ubuntu. I have now reinstalled CUDA10 and followed your advice to export variables and re-run the install.sh.

The output from the installation looked fine apart from this when connecting the worker to the master:


ERROR: This hostname is already registered! Remove it first.

**************** CRYOSPARC INSTALLATION COMPLETE *****************

Please re-start your terminal shell to make the cryosparcm
command available.

When I try the patch motion I get exactly the same as before (see below). You suggested re-installing CUDA but I have just re-installed with no errors or problems. Maybe I should try CUDA 10.1?

[CPU: 197.9 MB] Traceback (most recent call last):
File “cryosparc2_worker/cryosparc2_compute/run.py”, line 85, in cryosparc2_compute.run.main
File “cryosparc2_master/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 363, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 19439 has terminated unexpectedly!

Thanks,

Lucy

Ok, so since I still think the problem lies with CUDA, I have updated my bashrc and symbolic links to all point toward CUDA-10.0 rather than cuda 8.

After doing this I re-ran the installer and I got this at the end:


Autodetecting available GPUs…
Traceback (most recent call last):
File “bin/connect.py”, line 231, in
gpu_devidxs = check_gpus()
File “bin/connect.py”, line 105, in check_gpus
num_devs = print_gpu_list()
File “bin/connect.py”, line 22, in print_gpu_list
import pycuda.driver as cudrv
File “/home/lucytroman/Software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/driver.py”, line 5, in
from pycuda._driver import * # noqa
ImportError: libcurand.so.8.0: cannot open shared object file: No such file or directory

So clearly there is something looking for CUDA 8 associated software still as libcurand.so.8.0 is from the CUDA 8.0 toolkit.
I noticed someone else had a similar problem here: https://github.com/cryoem-uoft/cryosparc-issues/issues/187

This was solved by clearing the pip/cache: rm -rf ~/.cache/pip
(I didn’t want to do this so moved current pip dir to oldpip and made a new pip directory in it’s place as I didn’t want to delete something important.)

However, when I again repeat install.sh as before I get the same error as above. I had installed cryosparc previously with CUDA 8 and clearly it has stored some pathways based on this. How do I undo this? I could try a completely fresh install of cryosparc? What command would I use to make sure I fully removed it?

Thanks,

Lucy

FIXED!:

I followed this more recent thread here:

pycuda must have stored a pathway to cuda 8 from a previous install, despite all paths and configurations pointing it toward cuda10.0. To summarise from previous thread:

  1. navigate to cryosparc2_worker
  2. Execute eval $(bin/cryosparcw env)
  3. Execute pip uninstall pycuda # This will tell you which version it is uninstalling (eg. 2019.1)
    d. Execute pip install pycuda==2019.1 --no-cache-dir # replace 2019.1 with whichever version it just uninstalled for the cryosparc version. This requires an internet connection
4 Likes

Thanks for the update @Lucy!