GPU type not recognized correctly

Hi, after updating to 4.5.1, the four RTX3090 GPUs on my worker node are labelled as RTX2080 cards in the GUI.
Yesterday I upgraded the ssd on the same node and I found that the GPU-type is also incorrect in the output of cryosparcw connect:
gpus : [{‘id’: 0, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 1, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 2, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 3, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}]

The gpulist command identifies them correctly:
./bin/cryosparcw gpulist
Detected 4 CUDA devices.

id pci-bus name

   0                 1  NVIDIA GeForce RTX 3090                                                                
   1                37  NVIDIA GeForce RTX 3090                                                                
   2               193  NVIDIA GeForce RTX 3090                                                                
   3               225  NVIDIA GeForce RTX 3090                                                                

Any tips on how to fix this?
Thank you!

Please can you post:

  • the output of the command (on CryoSPARC master)
    cryosparcm cli "get_scheduler_targets()"
  • for each worker node, output of the command
    hostname && nvidia-smi --query-gpu=index,name --format=csv
  • a history of worker nodes’
    • hardware changes
    • hostname reassignments

Hi,
thanks for the quick reply.
I have pasted the output of the commands below.
The hardware has not changed and also the hostnames have always been hulk and echo.

The only thing that has changed is that I have recently copied the home folder of the cryosparcuser to a bigger disk. That caused some problems with the database but I was able to fix them with your help.

Cheers,
Gregor

(base) cryosparcuser@hulk:~$ cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: ‘/scratch/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 1, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 2, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 3, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}], ‘hostname’: ‘hulk’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘hulk’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], ‘GPU’: [0, 1, 2, 3], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, ‘ssh_str’: ‘cryosparcuser@hulk’, ‘title’: ‘Worker node hulk’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: ‘/scratch2/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 1, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 2, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 3, ‘mem’: 11546394624, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}], ‘hostname’: ‘11.0.0.2’, ‘lane’: ‘echo’, ‘monitor_port’: None, ‘name’: ‘11.0.0.2’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127], ‘GPU’: [0, 1, 2, 3], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]}, ‘ssh_str’: ‘cryosparcuser@11.0.0.2’, ‘title’: ‘Worker node 11.0.0.2’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw’}]

(base) cryosparcuser@hulk:~$ hostname && nvidia-smi --query-gpu=index,name --format=csv
hulk
index, name
0, NVIDIA GeForce RTX 2080 Ti
1, NVIDIA GeForce RTX 2080 Ti
2, NVIDIA GeForce RTX 2080 Ti
3, NVIDIA GeForce RTX 2080 Ti

(base) cryosparcuser@echo:~$ hostname && nvidia-smi --query-gpu=index,name --format=csv
echo
index, name
0, NVIDIA GeForce RTX 3090
1, NVIDIA GeForce RTX 3090
2, NVIDIA GeForce RTX 3090
3, NVIDIA GeForce RTX 3090

Please can you also run these commands on cryosparcuser@hulk and post their outputs

cryosparcm status | grep -e BASE_PORT -e MASTER_HOSTNAME
host echo
ssh 11.0.0.2 'hostname && nvidia-smi --query-gpu=index,name --format=csv && host hulk'

Sure! Here are the requested commands and their results.

(base) cryosparcuser@hulk:~$ cryosparcm status | grep -e BASE_PORT -e MASTER_HOSTNAME
export CRYOSPARC_MASTER_HOSTNAME=“hulk”
export CRYOSPARC_BASE_PORT=39000

(base) cryosparcuser@hulk:~$ host echo
echo.xxx.xxx.xxx has address 10.8.14.68 (I have put the .xxx.xxx.xxx)

(base) cryosparcuser@hulk:~$ ssh 11.0.0.2 ‘hostname && nvidia-smi --query-gpu=index,name --format=csv && host hulk’
echo
index, name
0, NVIDIA GeForce RTX 3090
1, NVIDIA GeForce RTX 3090
2, NVIDIA GeForce RTX 3090
3, NVIDIA GeForce RTX 3090
hulk.xxx.xxx.xxx has address 10.8.13.38 (I have put the .xxx.xxx.xxx)

Thank you for your help!

In this case, you may want to try the following:

  1. remove the current target entry for 11.0.0.2 (guide) by running this command on hulk:
    cryosparcm cli "remove_scheduler_target_node('11.0.0.2')"
    
  2. reconnect echo by running this command on echo:
    /home/cryosparcuser/cryosparc/cryosparc_worker/bin/cryosparcw connect --worker $(hostname) --master hulk --port 39000 --ssdpath /scratch2/cryosparc_cache --lane echo
    
    and post its output.

Thanks! That did the trick.