Gpu not available

widu · October 14, 2024, 11:38am

On our CryoSPARC instance (type master-worker, 2 identical workers with 8 GPUs each), I had configured all workers in the same lane (“default”). After some weeks now, we decided to put each worker in separate lane, so now there’s the lanes “worker1” and “worker2”. I deleted the default lane and reconnected the workers with:

cryosparcw  connect --worker cryosparc-worker1  --master cryosparc-master --port 39000 --ssdpath /home/sparcuser/ssd-cache --newlane --lane worker1

and

cryosparcw  connect --worker cryosparc-worker2  --master cryosparc-master --port 39000 --ssdpath /home/sparcuser/ssd-cache --newlane --lane worker2

Now I face the weird situation that worker1 works without problems, but when I try to queue a job (e.g. GPU test job) to lane worker2, I get

GPU not available

but queueing it to a specific GPU of worker2 the job works fine. (And I tested each available GPUs on worker2).

Additional information:

sparcuser@cryosparc-master:~$ cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/home/sparcuser/cryosparc/cryosparc_master
Current cryoSPARC version: v4.6.0
----------------------------------------------------------------------------

CryoSPARC process status:

app                              RUNNING   pid 4179, uptime 2:37:24
app_api                          RUNNING   pid 4191, uptime 2:37:23
app_api_dev                      STOPPED   Not started
command_core                     RUNNING   pid 4130, uptime 2:37:42
command_rtp                      RUNNING   pid 4156, uptime 2:37:33
command_vis                      RUNNING   pid 4152, uptime 2:37:35
database                         RUNNING   pid 4026, uptime 2:37:46

----------------------------------------------------------------------------
License is valid
----------------------------------------------------------------------------

global config variables:
export CRYOSPARC_LICENSE_ID="....."
export CRYOSPARC_MASTER_HOSTNAME="cryosparc-master"
export CRYOSPARC_DB_PATH="/home/sparcuser/cryosparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
export CRYOSPARC_PROJECT_DIR_PREFIX='CS-'
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_CLICK_WRAP=true

...
sparcuser@cryosparc-master:~$ cryosparcm log command_core
2024-10-14 11:31:43,987 scheduler_run_core   INFO     | Running...
2024-10-14 11:31:43,987 scheduler_run_core   INFO     | Jobs Queued: [('P1', 'J53')]
2024-10-14 11:31:43,989 scheduler_run_core   INFO     | Licenses currently active : 9
2024-10-14 11:31:43,989 scheduler_run_core   INFO     | Now trying to schedule J53
2024-10-14 11:31:43,989 scheduler_run_core   INFO     |     Queue status waiting_resources
2024-10-14 11:31:43,989 scheduler_run_core   INFO     |     Queue message GPU not available
2024-10-14 11:31:43,990 scheduler_run_core   INFO     | Finished

sparcuser@cryosparc-worker2:~$ cryosparcw gpulist
  Detected 8 CUDA devices.

   id           pci-bus  name
   ---------------------------------------------------------------
       0                 1  NVIDIA RTX 6000 Ada Generation                                                                
       1                33  NVIDIA RTX 6000 Ada Generation                                                                
       2                65  NVIDIA RTX 6000 Ada Generation                                                                
       3                97  NVIDIA RTX 6000 Ada Generation                                                                
       4               129  NVIDIA RTX 6000 Ada Generation                                                                
       5               161  NVIDIA RTX 6000 Ada Generation                                                                
       6               193  NVIDIA RTX 6000 Ada Generation                                                                
       7               225  NVIDIA RTX 6000 Ada Generation                                                                
   ---------------------------------------------------------------
sparcuser@cryosparc-worker2:~$

wtempel · October 15, 2024, 2:49pm

Welcome to the forum @widu, and thanks for posting information relevant to your question. Please can you additionally post the output oft he command (on cryosparc-master):

cryosparcm cli "get_scheduler_targets()"

widu · October 16, 2024, 8:03am

Thanks for looking into my problem @wtempel.

sparcuser@cryosparc-master:/root$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/home/sparcuser/ssd-cache',
  'cache_quota_mb': None, 
  'cache_reserve_mb': 10000, 
  'desc': None, 
  'gpus': [
    {'id': 0, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 1, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 2, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 3, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 4, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 5, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 6, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 7, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}], 
  'hostname': 'cryosparc-worker1', 
  'lane': 'worker1', 
  'monitor_port': None, 
  'name': 'cryosparc-worker1', 
  'resource_fixed': {'SSD': True}, 
  'resource_slots': {
    'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127], 
    'GPU': [0, 1, 2, 3, 4, 5, 6, 7], 
    'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192]}, 
  'ssh_str': 'sparcuser@cryosparc-worker1', 
  'title': 'Worker node cryosparc-worker1', 
  'type': 'node', 
  'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw'},
 {'cache_path': '/home/sparcuser/ssd-cache', 
  'cache_quota_mb': None, 
  'cache_reserve_mb': 10000, 
  'desc': None, 
  'gpus': [
    {'id': 0, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 1, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 2, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 3, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 4, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 5, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 6, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}, 
    {'id': 7, 'mem': 51002867712, 'name': 'NVIDIA RTX 6000 Ada Generation'}], 
  'hostname': 'cryosparc-worker2', 
  'lane': 'worker2', 
  'monitor_port': None, 
  'name': 'cryosparc-worker2', 
  'resource_fixed': {'SSD': True}, 
  'resource_slots': {
    'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127], 
    'GPU': [0, 1, 2, 3, 4, 5, 6, 7], 
    'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192]}, 
  'ssh_str': 'sparcuser@cryosparc-worker2', 
  'title': 'Worker node cryosparc-worker2', 
  'type': 'node', 
  'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw'}]

wtempel · October 16, 2024, 3:06pm

Thanks for posting the target info.

Does the problem persist after running
cryosparcm restart? (Caution: Run the command when no CryoSPARC jobs are running, as the restart would disrupt running CryoSPARC jobs).
Do you get GPU not available for for other job types also?

Please can you post the output of the command

csprojectid=P99 # replace with actual project ID
csjobid=J199 # replace with id of failed GPU test job
cryosparcm eventlog $csprojectid $csjobid
cryosparcm joblog $csprojectid $csjobid | tail -n 20
ssh sparcuser@cryosparc-worker2 "hostname && nvidia-smi"

widu · October 21, 2024, 2:20pm

Does the problem persist after running cryosparcm restart?

yes.

Do you get GPU not available for for other job types also?

yes

a) when queueing to a specific GPU

"
sparcuser@cryosparc-master:~$ cryosparcm eventlog $csprojectid $csjobid
[Mon, 21 Oct 2024 14:08:36 GMT]  License is valid.
[Mon, 21 Oct 2024 14:08:36 GMT]  Launching job on lane worker2 target cryosparc-worker2 ...
[Mon, 21 Oct 2024 14:08:36 GMT]  Running job on remote worker node hostname cryosparc-worker2
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB] Job J53 Started
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB] Master running v4.6.0, worker running v4.6.0
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB] Working in directory: /home/sparcuser/homes/widu/CS-widustests/J53
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB] Running on lane worker2
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB] Resources allocated:
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB]   Worker:  cryosparc-worker2
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB]   CPU   :  [48]
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB]   GPU   :  [0]
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB]   RAM   :  [16]
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB]   SSD   :  True
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB] --------------------------------------------------------------
[Mon, 21 Oct 2024 14:08:45 GMT] [CPU RAM used: 88 MB] Importing job module for job type worker_gpu_test...
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 225 MB] Job ready to run
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 225 MB] ***************************************************************
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] Obtaining GPU info via `nvidia-smi`...
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:01:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :33
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:21:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :32
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:41:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :33
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:61:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :31
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:81:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :33
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:A1:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :32
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:C1:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :33
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB] NVIDIA RTX 6000 Ada Generation @ 00000000:E1:00.0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     driver_version                :550.90.07
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     persistence_mode              :Enabled
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     power_limit                   :300.00
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     sw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     hw_power_limit                :Not Active
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     compute_mode                  :Default
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     max_pcie_link_gen             :4
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     current_pcie_link_gen         :1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     temperature                   :31
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     gpu_utilization               :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 256 MB]     memory_utilization            :0
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 347 MB] Starting GPU test on: NVIDIA RTX 6000 Ada Generation @ 1
[Mon, 21 Oct 2024 14:08:52 GMT] [CPU RAM used: 347 MB]     With CUDA Toolkit version: 11.8
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] Finished GPU test in 0.784s
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] Tensorflow test skipped.
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] PyTorch test skipped.
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] --------------------------------------------------------------
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] Compiling job outputs...
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] Updating job size...
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] Exporting job and creating csg files...
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] ***************************************************************
[Mon, 21 Oct 2024 14:08:53 GMT] [CPU RAM used: 397 MB] Job complete. Total time 1.82s

sparcuser@cryosparc-master:~$ cryosparcm joblog $csprojectid $csjobid | tail -n 20
instance_testing.run cryosparc_compute.jobs.jobregister
/home/sparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/core/getlimits.py:499: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/sparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float64'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
/home/sparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/core/getlimits.py:499: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  setattr(self, word, getattr(machar, word).flat[0])
/home/sparcuser/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for <class 'numpy.float32'> type is zero.
  return self._float_to_str(self.smallest_subnormal)
MONITOR PROCESS PID 4059
========= monitor process now waiting for main process
========= sending heartbeat at 2024-10-21 14:08:48.587815
***************************************************************
***************************************************************
========= main process now complete at 2024-10-21 14:08:53.853621
Total: 2.278s
  MAIN THREAD:

========= main process now complete at 2024-10-21 14:08:58.604742.
========= monitor process now complete at 2024-10-21 14:08:58.610132.

sparcuser@cryosparc-master:~$ ssh sparcuser@cryosparc-worker2 "hostname && nvidia-smi"
cryosparc-worker2
Mon Oct 21 14:13:05 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:01:00.0 Off |                  Off |
| 30%   34C    P8             22W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:21:00.0 Off |                  Off |
| 30%   32C    P8             23W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:41:00.0 Off |                  Off |
| 30%   33C    P8             21W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:61:00.0 Off |                  Off |
| 30%   32C    P8             22W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:81:00.0 Off |                  Off |
| 30%   34C    P8             21W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:A1:00.0 Off |                  Off |
| 30%   33C    P8             34W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:C1:00.0 Off |                  Off |
| 30%   33C    P8             15W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA RTX 6000 Ada Gene...    On  |   00000000:E1:00.0 Off |                  Off |
| 30%   31C    P8             17W /  300W |       2MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

when queueing to lane worker2

sparcuser@cryosparc-master:~$ cryosparcm eventlog $csprojectid $csjobid
sparcuser@cryosparc-master:~$ cryosparcm joblog $csprojectid $csjobid | tail -n 20
/home/sparcuser/homes/widu/CS-widustests/J53/job.log: No such file or directory
sparcuser@cryosparc-master:~$ ssh sparcuser@cryosparc-worker2 "hostname && nvidia-smi" -> see above

what I notice when queueing the job to the lane worker2:

the message “GPU not available” appears instantly
on worker2, there’s nothing in /var/log/auth.log - no sign of any login from master

wtempel · October 21, 2024, 9:15pm

Thanks @widu for posting this information. Please can you email us the tgz file that is created when you run the command
cryosparcm snaplogs. I will let you know our email address via a direct message.

widu · October 22, 2024, 7:40am

I have just sent the data. Many thanks for your help.

wtempel · October 30, 2024, 10:02pm

Please can you try the following sequence of commands (on the CryoSPARC master server) and actions and post the outputs:

run the commands

cryosparcm icli # enter the cryosparc interactive cli
import datetime
list(db.jobs.find({'status': {'$in': ['launched','started','running', 'waiting']}, 'deleted': False}, {'project_uid': 1, 'uid': 1, 'resources_allocated': 1))
datetime.datetime.now()
# leave icli open during next step

Queue a GPU-accelerated job to worker2

inside the interactive cli from the first step, run (after replacing P99, J199 with the actual project and job IDs, respectively):

datetime.datetime.now()
cli.get_job('P99', 'J199', 'instance_information', 'params_spec', 'job_type', 'status', 'version')
list(db.jobs.find({'status': {'$in': ['launched','started','running', 'waiting']}, 'deleted': False}, {'project_uid': 1, 'uid': 1, 'resources_allocated': 1))
# record outputs, then
exit()

widu · November 4, 2024, 1:27pm

Here’s the output. I’ve shortened the CPU and RAM sections.

sparcuser@cryosparc-master:~$ cryosparcm icli
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.
/home/sparcuser/cryosparc/cryosparc_master/setup_client_ipython.py:10: DeprecationWarning: `magic(...)` is deprecated since IPython 0.13 (warning added in 8.1), use run_line_magic(magic_name, parameter_s).
  get_ipython().magic(u"%reload_ext autoreload")  # noqa
/home/sparcuser/cryosparc/cryosparc_master/setup_client_ipython.py:11: DeprecationWarning: `magic(...)` is deprecated since IPython 0.13 (warning added in 8.1), use run_line_magic(magic_name, parameter_s).
  get_ipython().magic(u"%autoreload 2")  # noqa

 connecting to cryosparc-master:39002 ...
 cli, rtp, db, gfs and tools ready to use

In [1]: import datetime

In [2]: list(db.jobs.find({'status': {'$in': ['launched','started','running', 'waiting']}, 'deleted': False}, {'project_uid': 1, 'uid': 1, 'resources_allocated': 1}))
Out[2]: 
[{'_id': ObjectId('66e35938700581caf754beec'),
  'project_uid': 'P13',
  'uid': 'J11',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... ,127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... ,192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [0, 1, 2, 3, 4, 5], 'GPU': [0], 'RAM': [0, 1]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e3593a700581caf754c164'),
  'project_uid': 'P13',
  'uid': 'J12',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [6, 7, 8, 9, 10, 11], 'GPU': [1], 'RAM': [2, 3]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e35940700581caf754c9a0'),
  'project_uid': 'P13',
  'uid': 'J13',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [12, 13, 14, 15, 16, 17], 'GPU': [2], 'RAM': [4, 5]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e35942700581caf754cb00'),
  'project_uid': 'P13',
  'uid': 'J14',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [18, 19, 20, 21, 22, 23], 'GPU': [3], 'RAM': [6, 7]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f64700581caf78de743'),
  'project_uid': 'P14',
  'uid': 'J2',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [24, 25, 26, 27, 28, 29], 'GPU': [4], 'RAM': [8, 9]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f66700581caf78de837'),
  'project_uid': 'P14',
  'uid': 'J3',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [30, 31, 32, 33, 34, 35], 'GPU': [5], 'RAM': [10, 11]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f68700581caf78de94c'),
  'project_uid': 'P14',
  'uid': 'J4',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [36, 37, 38, 39, 40, 41], 'GPU': [6], 'RAM': [12, 13]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f69700581caf78deaa7'),
  'project_uid': 'P14',
  'uid': 'J5',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [42, 43, 44, 45, 46, 47], 'GPU': [7], 'RAM': [14, 15]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}}]

In [3]: datetime.datetime.now()
Out[3]: datetime.datetime(2024, 11, 4, 12, 55, 50, 827955)

In [4]: datetime.datetime.now()
Out[4]: datetime.datetime(2024, 11, 4, 12, 56, 32, 784814)

In [5]: cli.get_job('P1', 'J38', 'instance_information', 'params_spec', 'job_type', 'status', 'version')
Out[5]: 
{'_id': '66d57ae29703a7ac4cecb58c',
 'instance_information': {},
 'job_type': 'worker_gpu_test',
 'params_spec': {},
 'project_uid': 'P1',
 'status': 'queued',
 'uid': 'J38',
 'version': 'v4.5.3+240807'}

In [6]: list(db.jobs.find({'status': {'$in': ['launched','started','running', 'waiting']}, 'deleted': False}, {'project_uid': 1, 'uid': 1, 'resources_allocated': 1}))
Out[6]: 
[{'_id': ObjectId('66e35938700581caf754beec'),
  'project_uid': 'P13',
  'uid': 'J11',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [0, 1, 2, 3, 4, 5], 'GPU': [0], 'RAM': [0, 1]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e3593a700581caf754c164'),
  'project_uid': 'P13',
  'uid': 'J12',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [6, 7, 8, 9, 10, 11], 'GPU': [1], 'RAM': [2, 3]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e35940700581caf754c9a0'),
  'project_uid': 'P13',
  'uid': 'J13',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [12, 13, 14, 15, 16, 17], 'GPU': [2], 'RAM': [4, 5]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e35942700581caf754cb00'),
  'project_uid': 'P13',
  'uid': 'J14',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [18, 19, 20, 21, 22, 23], 'GPU': [3], 'RAM': [6, 7]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f64700581caf78de743'),
  'project_uid': 'P14',
  'uid': 'J2',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [24, 25, 26, 27, 28, 29], 'GPU': [4], 'RAM': [8, 9]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f66700581caf78de837'),
  'project_uid': 'P14',
  'uid': 'J3',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [30, 31, 32, 33, 34, 35], 'GPU': [5], 'RAM': [10, 11]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f68700581caf78de94c'),
  'project_uid': 'P14',
  'uid': 'J4',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [36, 37, 38, 39, 40, 41], 'GPU': [6], 'RAM': [12, 13]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}},
 {'_id': ObjectId('66e45f69700581caf78deaa7'),
  'project_uid': 'P14',
  'uid': 'J5',
  'resources_allocated': {'lane': 'default',
   'lane_type': 'node',
   'hostname': 'cryosparc-worker2',
   'target': {'type': 'node',
    'lane': 'default',
    'name': 'cryosparc-worker2',
    'title': 'Worker node cryosparc-worker2',
    'desc': None,
    'hostname': 'cryosparc-worker2',
    'ssh_str': 'sparcuser@cryosparc-worker2',
    'worker_bin_path': '/home/sparcuser/cryosparc/cryosparc_worker/bin/cryosparcw',
    'resource_slots': {'CPU': [0, ... , 127],
     'GPU': [0, 1, 2, 3, 4, 5, 6, 7],
     'RAM': [0, ... , 192]},
    'resource_fixed': {'SSD': True},
    'cache_path': '/home/sparcuser/ssd-cache',
    'cache_reserve_mb': 10000,
    'cache_quota_mb': None,
    'monitor_port': None,
    'gpus': [{'id': 0,
      'name': 'NVIDIA RTX 6000 Ada Generation',
      'mem': 51002867712},
     {'id': 1, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 2, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 3, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 4, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 5, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 6, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712},
     {'id': 7, 'name': 'NVIDIA RTX 6000 Ada Generation', 'mem': 51002867712}]},
   'slots': {'CPU': [42, 43, 44, 45, 46, 47], 'GPU': [7], 'RAM': [14, 15]},
   'fixed': {'SSD': False},
   'license': True,
   'licenses_acquired': 1}}]

wtempel · November 6, 2024, 11:00pm

Thanks @widu Please can you post a screenshot of the Current Jobs|Active Jobs panel on your CryoSPARC instance?

widu · November 11, 2024, 9:15am

wtempel · November 11, 2024, 4:43pm

The output you posted in Gpu not available - #9 by widu indicates jobs

P13 J11
P13 J12
P13 J13
P13 J14
P14 J2
P14 J3
P14 J4
P14 J5

are pending on cryosparc-worker2, and might be holding up jobs newly queued to cryosparc-worker2. To confirm this, please can you queue another GPU-accelerated job to worker2 and run a slightly modified command sequence:

cryosparcm icli # enter the cryosparc interactive cli
import datetime
datetime.datetime.now()
list(db.jobs.find({'status': {'$in': ['queued', 'launched','started','running', 'waiting']}}, {'_id':0, 'project_uid':1, 'uid':1, 'status':1, 'resources_allocated.hostname':1, 'queued_to_lane':1, 'queued_to_gpu':1, 'resources_needed':1, 'queued_at':1}))
# record outputs, then
exit()

widu · November 12, 2024, 11:49am

cryosparcm icli
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.
/home/sparcuser/cryosparc/cryosparc_master/setup_client_ipython.py:10: DeprecationWarning: `magic(...)` is deprecated since IPython 0.13 (warning added in 8.1), use run_line_magic(magic_name, parameter_s).
  get_ipython().magic(u"%reload_ext autoreload")  # noqa
/home/sparcuser/cryosparc/cryosparc_master/setup_client_ipython.py:11: DeprecationWarning: `magic(...)` is deprecated since IPython 0.13 (warning added in 8.1), use run_line_magic(magic_name, parameter_s).
  get_ipython().magic(u"%autoreload 2")  # noqa

 connecting to cryosparc-master:39002 ...
 cli, rtp, db, gfs and tools ready to use

In [1]: import datetime

In [2]: datetime.datetime.now()
Out[2]: datetime.datetime(2024, 11, 12, 11, 47, 21, 997302)

In [3]: list(db.jobs.find({'status': {'$in': ['queued', 'launched','started','running', 'waiting']}}, {'_id':0, 'project_uid':1, 'uid':1, 's
   ...: tatus':1, 'resources_allocated.hostname':1, 'queued_to_lane':1, 'queued_to_gpu':1, 'resources_needed':1, 'queued_at':1}))
Out[3]: 
[{'project_uid': 'P13',
  'uid': 'J11',
  'queued_at': datetime.datetime(2024, 9, 12, 21, 12, 24, 228000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P13',
  'uid': 'J12',
  'queued_at': datetime.datetime(2024, 9, 12, 21, 12, 26, 486000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P13',
  'uid': 'J13',
  'queued_at': datetime.datetime(2024, 9, 12, 21, 12, 33, 40000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P13',
  'uid': 'J14',
  'queued_at': datetime.datetime(2024, 9, 12, 21, 12, 34, 880000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P14',
  'uid': 'J2',
  'queued_at': datetime.datetime(2024, 9, 13, 15, 51, 0, 77000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P14',
  'uid': 'J3',
  'queued_at': datetime.datetime(2024, 9, 13, 15, 51, 2, 383000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P14',
  'uid': 'J4',
  'queued_at': datetime.datetime(2024, 9, 13, 15, 51, 4, 152000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P14',
  'uid': 'J5',
  'queued_at': datetime.datetime(2024, 9, 13, 15, 51, 5, 688000),
  'queued_to_lane': 'default',
  'resources_allocated': {'hostname': 'cryosparc-worker2'},
  'resources_needed': {'slots': {'CPU': 6, 'GPU': 1, 'RAM': 2},
   'fixed': {'SSD': False}},
  'status': 'launched',
  'queued_to_gpu': False},
 {'project_uid': 'P1',
  'uid': 'J38',
  'queued_at': datetime.datetime(2024, 11, 4, 12, 56, 16, 857000),
  'queued_to_lane': 'worker2',
  'resources_allocated': {},
  'resources_needed': {'slots': {'CPU': 1, 'GPU': 1, 'RAM': 1},
   'fixed': {'SSD': True}},
  'status': 'queued',
  'queued_to_gpu': False},
 {'project_uid': 'P10',
  'uid': 'J66',
  'queued_at': datetime.datetime(2024, 11, 12, 10, 15, 44, 399000),
  'queued_to_lane': 'worker2',
  'resources_allocated': {},
  'resources_needed': {'slots': {'CPU': 2, 'GPU': 4, 'RAM': 3},
   'fixed': {'SSD': True}},
  'status': 'queued',
  'queued_to_gpu': False},
 {'project_uid': 'P19',
  'uid': 'J53',
  'queued_at': datetime.datetime(2024, 11, 7, 16, 22, 46, 29000),
  'queued_to_lane': None,
  'resources_allocated': {'hostname': 'cryosparc-master'},
  'resources_needed': {'slots': {'CPU': 1, 'GPU': 0, 'RAM': 3},
   'fixed': {'SSD': False}},
  'status': 'waiting',
  'queued_to_gpu': False}]

In [4]: exit()

widu · November 12, 2024, 12:07pm

You were so damn right - Just deleting the pending jobs solved the problem. that’s a bit embarrassing, thanks for your patience.

What I find strange is that these jobs have probably somehow survived the deletion and recreation of the lane because that’s what I tried to solve the issue.