Hey guys,
I am currently struggling with a GPU assignment issue in CryoSPARC v2.11.0 on CentOS 7
My standalone workstation has
cryosparcw gpulist
Detected 2 CUDA devices.
id pci-bus name
0 0000:3B:00.0 GeForce RTX 2080 Ti
1 0000:AF:00.0 GeForce RTX 2080 Ti
with
export CRYOSPARC_CUDA_PATH="/usr/local/cuda-10.1"
and both GPUs are enabled in cryosparcw connect
However, parallel jobs are always assigned to ID 0 only, i.e. two refinement jobs that should run in parallel on ID 0 and ID 1 are both running on ID 0. This leads to the following error message
Traceback (most recent call last): File “cryosparc2_compute/jobs/runcommon.py”, line 1481, in run_with_except_hook run_old(*args, **kw) File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run File “cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py”, line 111, in cryosparc2_compute.engine.cuda_core.GPUThread.run File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 991, in cryosparc2_compute.engine.engine.process.work File “cryosparc2_worker/cryosparc2_compute/engine/engine.py”, line 87, in cryosparc2_compute.engine.engine.EngineThread.load_image_data_gpu File “/Local/app/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/gpuarray.py”, line 549, in fill func = elementwise.get_fill_kernel(self.dtype) File “<decorator-gen-13>”, line 2, in get_fill_kernel File “/Local/app/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/tools.py”, line 430, in context_dependent_memoize result = func(*args) File “/Local/app/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 496, in get_fill_kernel “fill”) File “/Local/app/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 161, in get_elwise_kernel arguments, operation, name, keep, options, **kwargs) File “/Local/app/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 147, in get_elwise_kernel_and_types keep, options, **kwargs) File “/Local/app/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/elementwise.py”, line 75, in get_elwise_module options=options, keep=keep) File “/Local/app/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/pycuda/compiler.py”, line 294, in init self.module = module_from_buffer(cubin) LogicError: cuModuleLoadDataEx failed: an illegal memory access was encountered -
Even if I connect only GPU ID 1 the job gets assigned to GPU ID 0 and crashes with the same error message. I have tried rebooting the machine and cryosparcm. If I connect only GPU ID 0 all jobs run fine. I can use GPU ID 1 for other programs, i.e. cryolo, relion so I assume somethings wrong in my cryosparc configuration. Everything runs fine on a parallel identical workstation.
Any help is greatly appreciated
Cheers,
Dan