Cryosparc sees wrong GPU

Hi

Today i discovered that some hosts in one Lane have “new” GPUs.

All workers have the same RTX 2080 Ti in it and not the RTX A5000 as Cryosparc think. Also confirmed by cryosparcw:

 /opt/cryosparc/cryosparc_worker/bin/cryosparcw gpulist
  Detected 4 CUDA devices.

   id           pci-bus  name
   ---------------------------------------------------------------
       0                24  NVIDIA GeForce RTX 2080 Ti
       1                59  NVIDIA GeForce RTX 2080 Ti
       2               134  NVIDIA GeForce RTX 2080 Ti
       3               175  NVIDIA GeForce RTX 2080 Ti
   ---------------------------------------------------------------

So i was thinking lets update the worker config:

/opt/cryosparc/cryosparc_worker/bin/cryosparcw connect --worker hostname --master master --port 39000 --ssdpath /disks/cryosparc-cache/cache --ssdreserve 1024 --lane RTX2080 --update

But then i get:

 ---------------------------------------------------------------
  Worker will be registered with 40 CPUs.
 ---------------------------------------------------------------
  Updating target worker
  Current configuration:
               cache_path :  /disks/cryosparc-cache/cache
           cache_quota_mb :  None
         cache_reserve_mb :  1024
                     desc :  None
                     gpus :  [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}]
                 hostname :  worker
                     lane :  RTX2080
             monitor_port :  None
                     name :  worker
           resource_fixed :  {'SSD': True}
           resource_slots :  {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}
                  ssh_str :  user
                    title :  Worker node worker
                     type :  node
          worker_bin_path :  /opt/cryosparc/cryosparc_worker/bin/cryosparcw
 ---------------------------------------------------------------
  SSD will be enabled.
  Worker will be registered with SSD cache location /disks/cryosparc-cache/cache
  SSD path will be updated to /disks/cryosparc-cache/cache
  SSD reserve will be updated to 1024 MB
  Worker will be reassigned to lane RTX2080
 ---------------------------------------------------------------
  Updating..
  Done.
 ---------------------------------------------------------------
  Final configuration for worker
               cache_path :  /disks/cryosparc-cache/cache
           cache_quota_mb :  None
         cache_reserve_mb :  1024
                     desc :  None
                     gpus :  [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}]
                 hostname :  worker
                     lane :  RTX2080
             monitor_port :  None
                     name :  worker
           resource_fixed :  {'SSD': True}
           resource_slots :  {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}
                  ssh_str :  user
                    title :  worker
                     type :  node
          worker_bin_path :  /opt/cryosparc/cryosparc_worker/bin/cryosparcw
 ---------------------------------------------------------------

so i tried with --nogpu and then adding it again. But now it is even more inconsistent:

image

vs

How can i fix that?

Thanks

@biocit May I ask

  1. Have there been any server hardware changes, nvidia driver updates or the network configuration?
  2. Does a CryoSPARC restart restore the display of the expected GPU configuration?

Yes. There was a nvidia driver update this patchday. But that happens frequently and this time it was the first time.

Hadn’t the chance to restart cryosparc as there are nonstop jobs running. 19.06.2025 is the next chance.

Are there RTX A5000 GPUs installed anywhere on your network?

Yes in one lane we have one server with 4x A5000 in it.

With the names of targets and lanes absent (or concealed) in the earlier posts, I cannot pinpoint any inconsistencies between the expected and actual configuration. It would also help to see on which host or hosts the commands were run.
Please keep in mind that the
cryosparcw connect command in CryoSPARC needs to be run on the specific worker that is to be connected (instead of another worker of the CryoSPARC instance). The --worker parameter (in typical configurations) may as well be specified as
--worker $(hostname -f).
Please can you post:

  1. output of the command
    cryosparcm cli "get_scheduler_targets()"
    
  2. for each worker, output of
    hostname -f && nvidia-smi -L
    
  3. from the UI, any representation of worker/GPU configuration, including hostname, where any inconsistencies from the expected configuration are highlighted

If this information needs to remain private, you may instead send the information to me in a forum personal message, but troubleshooting may be delayed in this case.

Here the output of the commands:

[{'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 1, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 2, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 3, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}], 'hostname': 'b-gpu01', 'lane': 'RTX4090', 'monitor_port': None, 'name': 'b-gpu01', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198,
199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'user@b-gpu01', 'title': 'Worker node b-gpu01', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 1, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 2, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 3, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}], 'hostname': 'b-gpu02', 'lane': 'RTX4090', 'monitor_port': None, 'name': 'b-gpu02', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107,
108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'user@b-gpu02', 'title': 'Worker node b-gpu02', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 1, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 2, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 3, 'mem': 25262096384, 'name':
'NVIDIA GeForce RTX 4090'}], 'hostname': 'b-gpu03', 'lane': 'RTX4090', 'monitor_port': None, 'name': 'b-gpu03', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197,
198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'user@b-gpu03', 'title': 'Worker node b-gpu03', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 1, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 2, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}, {'id': 3, 'mem': 25262096384, 'name': 'NVIDIA GeForce RTX 4090'}], 'hostname': 'b-gpu04', 'lane': 'RTX4090',
'monitor_port': None, 'name': 'b-gpu04', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106,
107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'user@b-gpu04', 'title': 'Worker node b-gpu04', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'm-gpu01', 'lane': 'RTX2080', 'monitor_port': None, 'name': 'm-gpu01', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15]}, 'ssh_str': 'user@m-gpu01', 'title': 'Worker node m-gpu01', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'm-gpu02', 'lane': 'Maintenance', 'monitor_port': None, 'name': 'm-gpu02', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'user@m-gpu02', 'title': 'Worker node m-gpu02', 'type': 'node', 'worker_bin_path':
'/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'm-gpu03', 'lane': 'RTXA5000', 'monitor_port': None, 'name': 'm-gpu03', 'resource_fixed':
{'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, 'ssh_str': 'user@m-gpu03', 'title': 'Worker node m-gpu03', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 1, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 2, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}, {'id': 3, 'mem': 11348672512, 'name': 'NVIDIA GeForce RTX 2080 Ti'}], 'hostname': 'm-gpu04', 'lane': 'RTX2080', 'monitor_port': None, 'name': 'm-gpu04', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'user@m-gpu04', 'title': 'Worker node m-gpu04', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'o-gpu01', 'lane': 'RTX2080', 'monitor_port': None, 'name': 'o-gpu01', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'user@o-gpu01', 'title': 'Worker node o-gpu01', 'type': 'node',
'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None,
'gpus': [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'r-gpu01', 'lane': 'RTX2080', 'monitor_port': None, 'name': 'r-gpu01', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,
33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'user@r-gpu01', 'title': 'Worker node r-gpu01', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 1024, 'desc': None, 'gpus': [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX
A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'r-gpu02', 'lane': 'RTX2080', 'monitor_port': None, 'name': 'r-gpu02', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'user@r-gpu02', 'title': 'Worker node r-gpu02', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'r-gpu03', 'lane': 'RTX2080', 'monitor_port': None, 'name': 'r-gpu03', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'user@r-gpu03', 'title': 'Worker node r-gpu03', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': '/disks/cryosparc-cache/cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 1, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 2, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}, {'id': 3, 'mem': 25294995456, 'name': 'NVIDIA RTX A5000'}], 'hostname': 'r-gpu04', 'lane': 'RTX2080', 'monitor_port': None, 'name': 'r-gpu04', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, 'ssh_str': 'user@r-gpu04', 'title': 'Worker node r-gpu04', 'type': 'node', 'worker_bin_path': '/opt/cryosparc/cryosparc_worker/bin/cryosparcw'}]


b-gpu01
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-7c0d2d67-7b91-1c7b-4975-3dd11911e466)
GPU 1: NVIDIA GeForce RTX 4090 (UUID: GPU-47e3481e-ae0d-89fc-4632-aa69a52c9301)
GPU 2: NVIDIA GeForce RTX 4090 (UUID: GPU-487a8daf-b195-6d25-87f1-c9e9ce738cf7)
GPU 3: NVIDIA GeForce RTX 4090 (UUID: GPU-db1abc7c-cfdc-bf1f-476f-e71598aa8534)

b-gpu02
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-3a6029f3-1c7b-287e-d37b-7dc1a1be0aec)
GPU 1: NVIDIA GeForce RTX 4090 (UUID: GPU-fe72341f-c891-4845-7860-106b882bb33e)
GPU 2: NVIDIA GeForce RTX 4090 (UUID: GPU-4af9fe8a-9ad3-b115-cfe1-227ff8c9d477)
GPU 3: NVIDIA GeForce RTX 4090 (UUID: GPU-f74df82b-d602-ef99-dd8f-dadaa831a508)

b-gpu03
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-4cfcd56f-4ba3-12f6-228e-549aa893dbf8)
GPU 1: NVIDIA GeForce RTX 4090 (UUID: GPU-dd2a569e-2851-b627-d7f5-fc9890278d02)
GPU 2: NVIDIA GeForce RTX 4090 (UUID: GPU-b1a2e614-ec8b-40df-eb4b-e9b857e9b73f)
GPU 3: NVIDIA GeForce RTX 4090 (UUID: GPU-1ade441a-5767-8795-be03-b0dc2fb288ac)

b-gpu04
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-31b4b77c-2ee6-95e7-89e1-fc5ad3a97fd5)
GPU 1: NVIDIA GeForce RTX 4090 (UUID: GPU-70d53266-9833-cd2a-6681-8f4081243863)
GPU 2: NVIDIA GeForce RTX 4090 (UUID: GPU-d39d5bff-fbe0-01fd-a53b-c1aae7c6913a)
GPU 3: NVIDIA GeForce RTX 4090 (UUID: GPU-3da2e842-5d57-c70f-56de-1adee5fb9897)

r-gpu01
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-e72ee86d-6b73-7f64-e27d-db8b06992e32)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-03d19b16-8dc3-fad5-33b5-df93d4611206)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-899d0c02-0687-e885-2ec8-ad05ba43824e)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-4a5bb07d-1571-2f36-5e51-204e9cbb07b4)

r-gpu02
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-d91e20e1-3f06-179e-c529-74132971411d)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-a707d41c-0214-5269-dc19-e9320481b40d)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-47ba76dd-9ce9-093a-c226-2b80dee23a0a)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-6ed8c7ac-da3c-db91-c9de-807b82e05845)

r-gpu03
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-33aa3c20-d342-13a7-367d-22b1ee0806c5)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-782eda68-f33f-185c-f421-eb0b0f709aba)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-4ae2b15b-9301-d233-b8a4-f3e6d060b5a6)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-a2bb5a32-188e-c35b-9305-1f4abf31a9fb)

r-gpu04
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-f9cf4eb7-3602-5a42-4d0c-435215a8145d)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-43cd9176-abac-0e93-4da4-b19f448d3e1c)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-a9f8d4f8-6a9d-26c9-16de-107ee0b81f16)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-20962de9-9b66-3156-7d7e-91f9e0d7934a)

m-gpu01
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-588b2aed-fa00-6f0c-29c8-8db1aad6c0b7)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-591e8300-69b8-39b1-4337-0fc9b58f156d)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-9c365525-81b7-2e61-f901-b8b59e4b1efe)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-f3637fdc-da6a-9434-c64b-f759c9beec1f)

m-gpu02
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-e868250c-1f92-2883-e8c6-eef2c16907ee)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-5682560-a5fd-b701-39b1-b8b59e4b1efe)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-55f4b260-a5fd-b701-0ee4-1e7fa38a6cd2)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-2c35d5be-7ac2-06c6-4823-98f857a0b1f0)

m-gpu03
GPU 0: NVIDIA RTX A5000 (UUID: GPU-5b3104d6-7cdf-8c12-9a7f-954e6f7c6708)
GPU 1: NVIDIA RTX A5000 (UUID: GPU-2710201d-a90b-3920-8861-13ba49f5e251)
GPU 2: NVIDIA RTX A5000 (UUID: GPU-548f6a90-5d84-450b-686e-92608216c6f0)
GPU 3: NVIDIA RTX A5000 (UUID: GPU-7cd253e1-a113-f9c1-051c-63779fe2df83)

m-gpu04
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-25673e3a-16ab-eba1-9d1c-9913726f5286)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-859ab3e8-1de1-8304-5b82-448eb6a7f312)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-a6e107f7-d049-1d6e-81d5-fd3c8aee628a)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-7b2a03e7-a1e5-ef4d-e353-8172550a66bc)

o-gpu01
GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-51c26794-d095-5698-5869-117aac6b1048)
GPU 1: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-12cadf05-1136-ad9e-c222-ae1abebc6af2)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-648e6e3d-f641-78d2-1519-86128f7662a0)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: GPU-8efb5fbb-c9b8-0baa-49a0-92977518f8fa)

Everything red is wrong:

m-gpu04 is even inconsistent between “queue to lane” and “Run on specific GPU”

image

now everything changed to RTX 4090 which is also available in another lane. Something is really wrong… Never had such a problem prior 4.7.0

  1. Was this change observed shortly after a CryoSPARC startup or restart or after running a cryosparcw connect command?
  2. Whenever cryosparcw connect is run, is it run on the very computer that matches the
    --worker parameter?
  3. Is each physical computer associated with the same, permanent hostname on the network? This may not be the case if, for example, if
    • the computer, from time to time, is assigned a different IP address that may no longer be associated with the computer’s previously assigned host name
    • name resolution is, from time to time, reconfigured on the network or on the CryoSPARC master computer

I upgraded now to 4.7.1 and now the GPUs are correct. I guess the change happens during the cryosparc startup. Why, no idea. The hosts have static IPs. It happens since 4.7.0 and i think when a NVIDIA driver update is applied. But thats basically every month.

Only the host m-gpu04 is still in the interface with 0 GPU despite in the “queue to specific gpu” shows the correct numbers and cryosparcm cli “get_scheduler_targets()” shows all the GPUs.

Any idea what i could to to fix this?

This could be the result of

If, and only if, m-gpu04 has exactly 4 GPUs and you would like to make those GPUs available to the CryoSPARC scheduler for automatic allocation, you could try running, on m-gpu04,
cryosparcw connect with the

  1. --update
  2. and --gpus 0,1,2,3
  3. and the other, relevant parameters