Greetings.
I have a workstation with two GPUs (GPU0 and GPU1). Jobs run fine on GPU0, but remain stuck when queued to GPU1.
Here is the job log for the stuck job:
================= CRYOSPARCW ======= 2022-08-09 14:25:47.139386 =========
Project P7 Job J64
Master drake.structbio.pitt.edu Port 39002
========= monitor process now starting main process
MAINPROCESS PID 13867
MAIN PID 13867
extract.run cryosparc_compute.jobs.jobregister
Traceback (most recent call last):
File “”, line 1, in
File “cryosparc_worker/cryosparc_compute/run.py”, line 173, in cryosparc_compute.run.run
File “/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1969, in get_gpu_info
} for devid in devs ]
File “/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1969, in
} for devid in devs ]
pycuda._driver.LogicError: cuDeviceGet failed: invalid device ordinal
Process Process-1:1:
Traceback (most recent call last):
File “/data/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py”, line 297, in _bootstrap
self.run()
File “/data/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.7/multiprocessing/process.py”, line 99, in run
self._target(*self._args, **self._kwargs)
File “/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py”, line 69, in process_pipeline_work
process_params = process_setup(proc_idx) # do any setup you want on a per-process basis
File “/data/cryosparc/cryosparc_worker/cryosparc_compute/jobs/extract/run.py”, line 384, in process_setup
cuda_core.initialize([cuda_dev])
File “cryosparc_worker/cryosparc_compute/engine/cuda_core.py”, line 34, in cryosparc_compute.engine.cuda_core.initialize
pycuda._driver.LogicError: cuDeviceGet failed: invalid device ordinal