Pycuda driver.py not loading

open
closed

#1

We have updated to v2.11, finally gotten the pycuda driver to build, but we are unable to run jobs.
Job output shows the following.

Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 69, in cryosparc2_compute.run.main
  File "cryosparc2_compute/jobs/jobregister.py", line 308, in get_run_function
    runmod = importlib.import_module(".."+modname, __name__)
  File "/local-home/repository/cryosparcv2/cryosparc2_worker/deps/anaconda/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "cryosparc2_worker/cryosparc2_compute/jobs/abinit/run.py", line 15, in init cryosparc2_compute.jobs.abinit.run
  File "cryosparc2_compute/engine/__init__.py", line 8, in <module>
    from engine import *
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 4, in init cryosparc2_compute.engine.engine
ImportError: No module named pycuda.driver

Which tracks with our attempting to run cryosparcv2_worker/bin/connect command and seeing the “no module named pycuda.driver” error.

cryosparcm status shows command_core, command_proxy, command_vis and webapp all running, with the following as stopped - app app_dev, command_rtp, watchdog_dev and webapp_dev.

I don’t know if any of the stopped jobs are needed.
I believe the issue is loading pycuda, and when I change the PYTHONPATH variable in the bin/cryosparcw script it seems to load. But other modules then fail, or they fail sooner and we don’t get there but I did get the PATH corrected and we are now finding nvcc.

trying the /bin/cryosparcw newcuda command I’ve prevented loading of the other modules and cause nvcc to not be found.

I don’t know enough about PYTHON, or cryosparc (I’m OS support, not a member of the department and not a crypsparc user nor prior installer) so I do need some guidance.

thanks in advance,
Brian


#2

OS is openSuse Leap 15.1, new OS is what precipitated Cryosparc issues, had been ok with v2.9 on Suse 13.2.


#3

Hi @BrianCuttler,

ImportError: No module named pycuda.driver

This usually means the pycuda compilation via pip didn’t actually complete successfully.

I don’t know if any of the stopped jobs are needed.

None of those are needed. The output you reported is normal.

when I change the PYTHONPATH variable in the bin/cryosparcw script it seems to load

This could mean that pycuda was installed and compiled into a different python environment on your system.

You can reinstall pycuda by doing the following:
Open a shell, and run:

cd cryosparc2_worker
cat config.sh
#ensure the `CRYOSPARC_CUDA_PATH` var is set to the parent directory of `/bin/nvcc`
#i.e. `/usr/local/cuda-10.1` 
eval $(./bin/cryosparcw env)
which pip 
#should be inside the cryosparc2_worker dir, 
#which means it's part of cryoSPARC's python environment.
pip uninstall pycuda
pip install "./deps_bundle/python/python_packages/pip_packages/pycuda-2018.1.1.tar.gz" --no-cache-dir

#4

Stephan,

I did a clean install install, new user, now login directory, etc.

I created a new database directory, renamed it out of the way and then copied the old database directory into the new location, checked owner and group on the parent directory, files and subdirectories. User is reporting that with restored directory they can login, but (did not give me specific error) could not create a job directory. I have asked for more information, wondering if it isn’t something under the ssdpath rather than in the database.

I do seem to still have the pycuda error. Found when I issued the cryosparcw connect command as a test. I do not know if that is a reasonable test, but I’d have hoped that the clean install would have taken care of this by avoiding whatever error existed prior.

cuda path is /usr/local/cuda, which is a link to /usr/local/cuda-10.1, recently installed.

I will run the pip pycuda commands now. Had expected it to run cleanly from the kit but will perform this step.

ah, invalid cross-device link… that could explain things.


#5

Finally fixed the pip uninstall pycuda issue, but reinstalling is slow, needed to continually add to the INCLUDE path for .h files, finally failing with an error on bpl-subset/bpl_subset/boost/date_time/time_duration.hpp invalid field pycudaboost::date_time:time_duration.

I obviously don’t have the necessary environment for the build, I am the cryosparc user, the login did run the user dot files, but they aren’t meant for build but running cryosparc.

There is a better way for me to be doing this?

thanks,
Brian