Running the CryoSPARC Worker environment steps again proves successful now (?)
[zahodnbd@em504-02 ~]$ eval $(/ibex/scratch/projects/c2121/cryosparc/cryosparc_worker/bin/cryosparcw env)
[zahodnbd@em504-02 ~]$ echo $CRYOSPARC_CUDA_PATH
/sw/csgv/cuda/11.2.2/el7.9_binary
[zahodnbd@em504-02 ~]$ ${CRYOSPARC_CUDA_PATH}/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
[zahodnbd@em504-02 ~]$ python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 2, 0)
[zahodnbd@em504-02 ~]$ uname -a && free -g && nvidia-smi
Linux em504-02 3.10.0-1160.76.1.el7.x86_64 #1 SMP Wed Aug 10 16:21:17 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
total used free shared buff/cache available
Mem: 376 8 357 7 10 358
Swap: 29 0 29
Fri Dec 9 07:01:02 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:1A:00.0 Off | N/A |
| 29% 25C P8 10W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... On | 00000000:1C:00.0 Off | N/A |
| 30% 28C P8 10W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... On | 00000000:1D:00.0 Off | N/A |
| 30% 27C P8 17W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... On | 00000000:1E:00.0 Off | N/A |
| 30% 27C P8 18W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA GeForce ... On | 00000000:3D:00.0 Off | N/A |
| 30% 24C P8 4W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA GeForce ... On | 00000000:3F:00.0 Off | N/A |
| 30% 26C P8 26W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA GeForce ... On | 00000000:40:00.0 Off | N/A |
| 31% 24C P8 3W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA GeForce ... On | 00000000:41:00.0 Off | N/A |
| 30% 26C P8 12W / 250W | 1MiB / 11264MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Testing the Cryosparc workers fails still, though:
[zahodnbd@em504-02 ~]$ cryosparcm test workers P10
Using project P10
Running worker tests...
2022-12-09 07:14:41,042 WORKER_TEST log CRITICAL | Worker test results
2022-12-09 07:14:41,042 WORKER_TEST log CRITICAL | em504-02.ibex.kaust.edu.sa
2022-12-09 07:14:41,042 WORKER_TEST log CRITICAL | ✕ LAUNCH
2022-12-09 07:14:41,042 WORKER_TEST log CRITICAL | Error:
2022-12-09 07:14:41,042 WORKER_TEST log CRITICAL | See P10 J7 for more information
2022-12-09 07:14:41,042 WORKER_TEST log CRITICAL | ⚠ SSD
2022-12-09 07:14:41,043 WORKER_TEST log CRITICAL | Did not run: Launch test failed
2022-12-09 07:14:41,043 WORKER_TEST log CRITICAL | ⚠ GPU
2022-12-09 07:14:41,043 WORKER_TEST log CRITICAL | Did not run: Launch test failed
P10 J7 Event Log from the dashboard:
License is valid.
Launching job on lane default target em504-02.ibex.kaust.edu.sa ...
Running job on master node hostname em504-02.ibex.kaust.edu.sa
**** Kill signal sent by unknown user ****
[CPU: 84.2 MB]
Job J7 Started
[CPU: 84.3 MB]
Master running v4.0.3, worker running v4.0.3
[CPU: 84.3 MB]
Working in directory: /ibex/scratch/projects/c2121/Brandon/cryosparc_datasets/CS-test-project/J7
[CPU: 84.3 MB]
Running on lane default
[CPU: 84.3 MB]
Resources allocated:
[CPU: 84.3 MB]
Worker: em504-02.ibex.kaust.edu.sa
[CPU: 84.3 MB]
CPU : [4]
[CPU: 84.3 MB]
GPU : []
[CPU: 84.3 MB]
RAM : [1]
[CPU: 84.3 MB]
SSD : False
[CPU: 84.3 MB]
--------------------------------------------------------------
[CPU: 84.3 MB]
Importing job module for job type instance_launch_test...
[CPU: 190.6 MB]
Job ready to run
[CPU: 190.7 MB]
***************************************************************
[CPU: 190.7 MB]
Job successfully running
[CPU: 190.7 MB]
--------------------------------------------------------------
[CPU: 190.7 MB]
Compiling job outputs...
[CPU: 190.7 MB]
Updating job size...
[CPU: 190.7 MB]
Exporting job and creating csg files...
[CPU: 190.7 MB]
***************************************************************
[CPU: 190.7 MB]
Job complete. Total time 0.44s
P10 J7 job.log file:
[zahodnbd@em504-02 J7]$ cat job.log
================= CRYOSPARCW ======= 2022-12-09 07:14:48.234614 =========
Project P10 Job J7
Master em504-02.ibex.kaust.edu.sa Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 58822
========= monitor process now waiting for main process
MAIN PID 58822
instance_testing.run cryosparc_compute.jobs.jobregister
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
***************************************************************
***************************************************************
========= main process now complete.
========= monitor process now complete.