Gpu info takes too long on worker/run on specific GPUs does not work

Hi,
We have a problem connecting 4 workers (nodes) to cryosparc properly. The initial setup with ‘cryosparcw connect’ ran fine without errors and all 4 workers are connected. However, cryosparc has trouble to get the resources from the individual nodes:

Main symptom: ‘Run on specific GPU’ does not give check boxes to select individual GPUs
Minor symptom: Scheduling jobs takes long

Error message (same for all 4 nodes):
[GPU_INFO]: Calling the python function to get GPU info took longer that 30 seconds on <node1-4>.

Additional info:
cryosparc runs fine with default lane consisting of 4 nodes/workers.
ssh to workers is ok
running ‘time cryosparcw gpulist’ on workers takes >2 min
(same for python cryosparcw/cryosparc_compute/get_gpu_info.py via ssh)

Do you have any suggestions how to fix this? As we have a multi-user setup we would like to have the option to run on specific GPUs.

Thanks,
christian

Hi @cbiertue,

There isn’t an option to modify this timeout without manually modifying the command_core module’s code. To do so, edit the file cryosparc_master/cryosparc_command/command_core/__init__.py

In the function get_gpu_info(), you’ll see the line:
cmd = 'bash -c "eval $(' + worker_bin_path + ' env); timeout 30 ' + python_command + '"'

Change 30 to 300, then save the file.

Once that’s done, restart cryoSPARC: cryosparcm restart. The function will run automatically upon startup and you should be good to go.

Hi @stephan and others,

I have just found out that I can solve my problem to select individual GPUs in 'Run on specific GPU" by specifying the parameter ‘–gpus’ when connecting a worker to the master (cryosparcw connect).

Before, I omitted this parameter as I wanted to connect all GPUs of a worker to the master. I get entries for individual GPUs while queuing a job when I deliberately specify (all) GPUs with --gpus 0,1,2,3,…

Maybe, this is useful for others,
christian