Cannot Use Queue Modal to Queue to a specific GPU

Same error using Muti-GPUs. Jobs can be finished when I used only one GPU.
And I can’t select the specific GPU.

Hi @FengjiangLiu,

It seems like you can’t select a specific GPU because the GPU information is not populated in the database. The function that populates this information auto-runs when you start cryoSPARC. If for some reason the function fails, it will fail silently. You can re-run the function itself and decode the error logs (by monitoring cryosparcm log command_core).

In a shell, run: cryosparcm cli "get_gpu_info()" && cryosparcm log command_core

You might see a traceback- if you know what the problem is, go ahead and fix it, otherwise post it here and I can suggest some next steps.

I ran the command you have given. But I don’t know how to fix it. Could you give me some advice?

Hi @FengjiangLiu,

Looks like some sort of SSH error. From that machine, if you run ssh spuser@spgpu, do you get a Host Verification request? Or any other type of error?

Hi, @stephan


Just as same as former.

I noticed that when I upgrade cryoSPARC, many old software were upgraded too. If it is possible that something was changed about GPU or GPU detection.

Hi @stephan

I got the same problem on my ‘standalone’ machine. the sshuser@hostname can lead to the verification request.

Hi @FengjiangLiu,

You need password-less SSH access to this machine for the function to work properly.

Ensure that SSH keys are set up for the cryosparc_user account to SSH between the master node and the worker node without a password. From https://cryosparc.com/docs/reference/install/#remote-access:

Set up SSH keys for password-less access (only if you currently need to enter your password each time you ssh into the compute node).

  1. If you do not already have SSH keys generated on your local machine, use ssh-keygen to do so. Open a terminal prompt, and enter:

    ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
    

    Note: this will create an RSA key-pair with no passphrase in the default location.

  2. Copy the RSA public key to the remote compute node for password-less login:

    ssh-copy-id remote_username@remote_hostname
    

    Note: remote_username and remote_hostname are your username and the hostname that you use to SSH into your compute node. This step will ask for your password.

Hi @hxn,

Thanks for reporting this. We’ll add a fix to ensure SSH is not used on standalone instances.

Hi @ sarulthasan

I can select a specific GPU now. Thanks a lot.

But there is still a problem in using Muti-GPUs in Patch Motion Correction.

And when I ran “cryosparcm cli “get_gpu_info()” && cryosparcm log command_core”, the output shows like this.

Hi @FengjiangLiu,

Glad you were able to get the GPU queuing to work. Regarding the other error, it was a bug in cryoSPARC v2.12.0 and 2.12.2. A patch has been released (v2.12.4) to fix this issue, as well as a few others. To update, run cryosparcm update

Thank you very much! I’m going to try!

Hey,

unfortunately this solution also doesn’t work. We have a single master/worker workstation and the ssh also fails to connect on the same machine. I tried to pair ssh keys but no success. Always shows:

[GPU_INFO]: Failed calling the python function to get GPU info on witcher: Command ‘[‘ssh’, u’dawid@witcher’, u’bash -c “eval $(/mnt/data1/cryosparc2/cryosparc2_worker/bin/cryosparcw env); timeout 10 python /mnt/data1/cryosparc2/cryosparc2_worker/cryosparc2_compute/get_gpu_info.py”’]’ returned non-zero exit status 255

for all gpus.

Best,
Dawid

Hi @dzyla,

So if you ensured that running ssh dawid@witcher directly doesn’t raise a Host Key Verification question/error, can you go ahead and try to run the following commands from the master node:

eval $(cryosparcm env)
ssh dawid@witcher bash -c “eval $(/mnt/data1/cryosparc2/cryosparc2_worker/bin/cryosparcw env); timeout 10 python /mnt/data1/cryosparc2/cryosparc2_worker/cryosparc2_compute/get_gpu_info.py”

You will probably see the actual error more clearly.

Thanks a lot, it helped to find the error. The problem is with the ssh command. The command you suggested finishes with error because the ssh command is not run, but instead the local command is run. I modified to command and now it works:

ssh dawid@witcher ‘bash -c eval $(/mnt/data1/cryosparc2/cryosparc2_worker/bin/cryosparcw env); timeout 10 python /mnt/data1/cryosparc2/cryosparc2_worker/cryosparc2_compute/get_gpu_info.py’

[{“mem”: 12788498432, “id”: 0, “name”: “TITAN Xp”}, {“mem”: 12788498432, “id”: 1, “name”: “TITAN Xp”}, {“mem”: 12788498432, “id”: 2, “name”: “TITAN Xp”}, {“mem”: 12788498432, “id”: 3, “name”: “TITAN Xp”}]

Is it possible to fix the previous command:

cryosparcm cli “get_gpu_info()” && cryosparcm log command_core

to update the number of GPUs on each node? This would be great!

Hi @dzyla,

Thanks for pointing this out, we’ve fixed this in our current release branch. You will receive this fix in an update to cryoSPARC soon!