Cryosparc worker issues


Running the CryoSPARC Worker environment steps again proves successful now (?)

[zahodnbd@em504-02 ~]$ eval $(/ibex/scratch/projects/c2121/cryosparc/cryosparc_worker/bin/cryosparcw env)
[zahodnbd@em504-02 ~]$ echo $CRYOSPARC_CUDA_PATH
/sw/csgv/cuda/11.2.2/el7.9_binary
[zahodnbd@em504-02 ~]$ ${CRYOSPARC_CUDA_PATH}/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
[zahodnbd@em504-02 ~]$ python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 2, 0)
[zahodnbd@em504-02 ~]$ uname -a && free -g && nvidia-smi
Linux em504-02 3.10.0-1160.76.1.el7.x86_64 #1 SMP Wed Aug 10 16:21:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
              total        used        free      shared  buff/cache   available
Mem:            376           8         357           7          10         358
Swap:            29           0          29
Fri Dec  9 07:01:02 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:1A:00.0 Off |                  N/A |
| 29%   25C    P8    10W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:1C:00.0 Off |                  N/A |
| 30%   28C    P8    10W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  On   | 00000000:1D:00.0 Off |                  N/A |
| 30%   27C    P8    17W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  On   | 00000000:1E:00.0 Off |                  N/A |
| 30%   27C    P8    18W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  On   | 00000000:3D:00.0 Off |                  N/A |
| 30%   24C    P8     4W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  On   | 00000000:3F:00.0 Off |                  N/A |
| 30%   26C    P8    26W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  On   | 00000000:40:00.0 Off |                  N/A |
| 31%   24C    P8     3W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  On   | 00000000:41:00.0 Off |                  N/A |
| 30%   26C    P8    12W / 250W |      1MiB / 11264MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Testing the Cryosparc workers fails still, though:

[zahodnbd@em504-02 ~]$ cryosparcm test workers P10
Using project P10
Running worker tests...
2022-12-09 07:14:41,042 WORKER_TEST          log                  CRITICAL | Worker test results
2022-12-09 07:14:41,042 WORKER_TEST          log                  CRITICAL | em504-02.ibex.kaust.edu.sa
2022-12-09 07:14:41,042 WORKER_TEST          log                  CRITICAL |   ✕ LAUNCH
2022-12-09 07:14:41,042 WORKER_TEST          log                  CRITICAL |     Error: 
2022-12-09 07:14:41,042 WORKER_TEST          log                  CRITICAL |     See P10 J7 for more information
2022-12-09 07:14:41,042 WORKER_TEST          log                  CRITICAL |   ⚠ SSD
2022-12-09 07:14:41,043 WORKER_TEST          log                  CRITICAL |     Did not run: Launch test failed
2022-12-09 07:14:41,043 WORKER_TEST          log                  CRITICAL |   ⚠ GPU
2022-12-09 07:14:41,043 WORKER_TEST          log                  CRITICAL |     Did not run: Launch test failed

P10 J7 Event Log from the dashboard:

License is valid.

Launching job on lane default target em504-02.ibex.kaust.edu.sa ...

Running job on master node hostname em504-02.ibex.kaust.edu.sa

**** Kill signal sent by unknown user ****

[CPU: 84.2 MB]
Job J7 Started

[CPU: 84.3 MB]
Master running v4.0.3, worker running v4.0.3

[CPU: 84.3 MB]
Working in directory: /ibex/scratch/projects/c2121/Brandon/cryosparc_datasets/CS-test-project/J7

[CPU: 84.3 MB]
Running on lane default

[CPU: 84.3 MB]
Resources allocated: 

[CPU: 84.3 MB]
  Worker:  em504-02.ibex.kaust.edu.sa

[CPU: 84.3 MB]
  CPU   :  [4]

[CPU: 84.3 MB]
  GPU   :  []

[CPU: 84.3 MB]
  RAM   :  [1]

[CPU: 84.3 MB]
  SSD   :  False

[CPU: 84.3 MB]
--------------------------------------------------------------

[CPU: 84.3 MB]
Importing job module for job type instance_launch_test...

[CPU: 190.6 MB]
Job ready to run

[CPU: 190.7 MB]
***************************************************************

[CPU: 190.7 MB]
Job successfully running

[CPU: 190.7 MB]
--------------------------------------------------------------

[CPU: 190.7 MB]
Compiling job outputs...

[CPU: 190.7 MB]
Updating job size...

[CPU: 190.7 MB]
Exporting job and creating csg files...

[CPU: 190.7 MB]
***************************************************************

[CPU: 190.7 MB]
Job complete. Total time 0.44s

P10 J7 job.log file:

[zahodnbd@em504-02 J7]$ cat job.log


================= CRYOSPARCW =======  2022-12-09 07:14:48.234614  =========
Project P10 Job J7
Master em504-02.ibex.kaust.edu.sa Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 58822
========= monitor process now waiting for main process
MAIN PID 58822
instance_testing.run cryosparc_compute.jobs.jobregister
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
***************************************************************
***************************************************************
========= main process now complete.
========= monitor process now complete.