GPUs not showing up after upgrade

Our group recently updated GPUs on a worker, however, they are not showing up in the lane and are not available for use by cryosparc.

Details: We had four RTX 3070 Ti on a worker. We upgraded this node to remove the old GPUs and added two RTX 4080. After updating the cryosparc software (both on the master and worker), removed the worker and added it again with the new GPUs. No errors or complains. However in the web interface, jobs that previously started successfully do not see the new GPUs. nvidia-smi shows the new GPUs and have been tested with other applications.

Any thoughts on what could be wrong and how these two 4080 GPUs can be used by cryosparc?

I believe you will need to tell cryoSPARC about the new devices.

cd /path/to/cryosparc_worker/
bin/cryosparcw connect --update
bin/cryosparcw gpulist

You might need to adjust the connect command, not sure.

Are you referring to
cryosaprcw connect ...?

I have not tested CUDA toolkit requirements for this GPU (generation), but this card may require CUDA-11.8 (or higher, but I have not successfully tested CryoSPARC with CUDA > 11.8)
What are the outputs of:

  • cryosparcw call which nvcc
  • cryosparcw call nvcc --version
  • nvidia-smi --query-gpu=driver_version --format=csv

Correct. Used:

./bin/cryosparcw connect --worker (name removed) --update --master (name removed)

I would tend to agree, this might be a CUDA version issue with the 4080 cards. We had other software give problems on these cards with older CUDA versions as well.

./bin/cryosparcw call which nvcc
(/usr/local/cuda is a link to /usr/local/cuda-11.8)

./bin/cryosparcw call nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

nvidia-smi --query-gpu=driver_version --format=csv

No luck. Did not work.

Just to confirm:

  • are two 4080 cards the only GPUs on this system
  • what is the output of
    cryosparcw gpulist
  • what us the output of
    cryosparcm cli "get_scheduler_targets()"
  • is this computer in any way related to the computer and or CryoSPARC instance under Python issue during installation

Correct. Just two 4080s on the system, no other GPUs.

cryosparcw gpulist
Detected 2 CUDA devices.

id pci-bus name

   0      0000:05:00.0  NVIDIA GeForce RTX 4080
   1      0000:09:00.0  NVIDIA GeForce RTX 4080

cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 8510832640, ‘name’: ‘NVIDIA GeForce GTX 1080’}], ‘hostname’: ‘(master, exact hostname removed for security reasons)’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘master’, ‘resource_fixed’: {‘SSD’: False}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], ‘GPU’: [0], ‘RAM’: [0, 1, 2, 3]}, ‘ssh_str’: 'cryosparc@master, ‘title’: ‘Worker node master’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparc/CryoSPARC/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: None, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 16860184576, ‘name’: ‘NVIDIA GeForce RTX 4080’}, {‘id’: 1, ‘mem’: 16857169920, ‘name’: ‘NVIDIA GeForce RTX 4080’}], ‘hostname’: ‘worker_1’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘worker_1’, ‘resource_fixed’: {‘SSD’: False}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7]}, ‘ssh_str’: ‘cryosparc@worker_1’, ‘title’: ‘Worker node worker_1’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparc/CryoSPARC/cryosparc_worker/bin/cryosparcw’}]

(PS: The GTX 1080 just happened to be on master for display purposes, it is not used for Cryosparc)

No, this computer and installation is not related to the computer in other thread. Completely independent systems and installations.

Is this true even now (after you ran get_scheduler_targets())? How does “not see the new GPUs” manifest itself in the user interface?

See the image. Clearly I am missing something here.

Strange. What is the output of
cryosparcm cli "get_scheduler_lanes()"