Hi,
We have a problem connecting 4 workers (nodes) to cryosparc properly. The initial setup with ‘cryosparcw connect’ ran fine without errors and all 4 workers are connected. However, cryosparc has trouble to get the resources from the individual nodes:
Main symptom: ‘Run on specific GPU’ does not give check boxes to select individual GPUs
Minor symptom: Scheduling jobs takes long
Error message (same for all 4 nodes):
[GPU_INFO]: Calling the python function to get GPU info took longer that 30 seconds on <node1-4>.
Additional info:
cryosparc runs fine with default lane consisting of 4 nodes/workers.
ssh to workers is ok
running ‘time cryosparcw gpulist’ on workers takes >2 min
(same for python cryosparcw/cryosparc_compute/get_gpu_info.py
via ssh)
Do you have any suggestions how to fix this? As we have a multi-user setup we would like to have the option to run on specific GPUs.
Thanks,
christian
Hi @cbiertue,
There isn’t an option to modify this timeout without manually modifying the command_core
module’s code. To do so, edit the file cryosparc_master/cryosparc_command/command_core/__init__.py
In the function get_gpu_info()
, you’ll see the line:
cmd = 'bash -c "eval $(' + worker_bin_path + ' env); timeout 30 ' + python_command + '"'
Change 30
to 300
, then save the file.
Once that’s done, restart cryoSPARC: cryosparcm restart
. The function will run automatically upon startup and you should be good to go.
Hi @stephan and others,
I have just found out that I can solve my problem to select individual GPUs in 'Run on specific GPU" by specifying the parameter ‘–gpus’ when connecting a worker to the master (cryosparcw connect).
Before, I omitted this parameter as I wanted to connect all GPUs of a worker to the master. I get entries for individual GPUs while queuing a job when I deliberately specify (all) GPUs with --gpus 0,1,2,3,…
Maybe, this is useful for others,
christian