Pycuda.driver not found

open

#1

Leap 15.1, Cryosparc v2.11

This is a log file from a user data directory.

It shows that the pycuda driver is not being found. We were finally able to
We had this same error when trying to connect the worker with the master (same machine, --standalone install) but were able to locate the driver when we removed the soft link /usr/local/cuda
from the system.

I think we have a driver (under anaconda directory?) but the path is being preempted.
But I’m not sure and don’t have a fix for it, as # cryosparcm env isn’t showing a issue with the path.

Please help me find and fix.

thanks in advance,
Brian

/usr16/data/rzk01/cryos2/P2/P15/J223/job.log

gyan 331% m J223/job.log

================= CRYOSPARCW ======= 2019-10-04 14:54:51.249530 =========

Project P15 Job J223

Master tulasi Port 39002

===========================================================================

========= monitor process now starting main process

MAINPROCESS PID 5409

========= monitor process now waiting for main process

MAIN PID 5409

refine.run cryosparc2_compute.jobs.jobregister

**** handle exception rc

set status to failed

Traceback (most recent call last):

File "cryosparc2_worker/cryosparc2_compute/run.py", line 69, in cryosparc2_compute.run.main

File "cryosparc2_compute/jobs/jobregister.py", line 308, in get_run_function

runmod = importlib.import_module(".."+modname, __name__)

File "/local-home/repository/cryosparcv211/cryosparc2_worker/deps/anaconda/lib/python2.7/importl

ib/__init__.py", line 37, in import_module

__import__(name)

File "cryosparc2_worker/cryosparc2_compute/jobs/refine/run.py", line 15, in init cryosparc2_comp

ute.jobs.refine.run

File "cryosparc2_compute/engine/__init__.py", line 8, in <module>

from engine import *

File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 4, in init cryosparc2_compute

.engine.engine

ImportError: No module named pycuda.driver

========= main process now complete.

========= monitor process now complete.

gyan 332%

#2

A second look has (finally) made me realize that the ./anaconda2 directory is entirely missing from the new installation. I’d thought it build automatically when you ran the scripts per the installation web page, it certainly took a long time doing something. Not at all sure where the files are, but they are clearly missing.


#3

Defined PATH to include /usr/local/cuda-10.1 and as SU ran # pip install pycuda, finally got a clean install.
Stopped the daemons and restarted, # cryosparcm stop/start, then ran the cryosparc worker connect command.

I now realize that the resource_slots for GPU is the null set. That clearly explains why jobs waiting on GPU aren’t finding them, but I don’t know how to resolve the root cause.

Any help?
thanks,
Brian


#4

worker now sees the GPU, but scheduler doesn’t.
I assumed cached info from when the problem existed?
How to clear? Unregister and reregister worker?
How to unregister? I’m not seeing it in the docs.


#5

Cryosparcw is able to list the GPUs, but the resource_slots still shows no gpus…
Without the worker telling the master that GPUs are available jobs requiring GPU will remain pending.
I’ve no idea how to get the worker to actually report the GPUs that it actually sees.

Does the cryosparcw connect command take a parameter to the --gpus option? if so, what does the option look like? I am running Titan X Pascal GPUs, 4 of them.