License is valid.
Launching job on lane default target cryo10.ourdomain..edu ...
Running job on remote worker node hostname cryo10.ourdomain.edu
Failed to launch! 255
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
I already looked at this thread with the same permission error but that appears to be for a worker and master on the same server. There are a few other threads with this error but I haven’t found one that resembles this.
The worker starts without errors:
bin/cryosparcw connect --worker cryo10.ourdomain.edu --master cryo11.ourdomain.edu --port 39000 --nossd --update
---------------------------------------------------------------
CRYOSPARC CONNECT --------------------------------------------
---------------------------------------------------------------
Attempting to register worker cryo10.ourdomain.edu to command cryo11.ourdomain.edu:39002
Connecting as unix user root
Will register using ssh string: root@cryo10.ourdomain.edu
If this is incorrect, you should re-run this command with the flag --sshstr <ssh string>
---------------------------------------------------------------
Connected to master.
---------------------------------------------------------------
Current connected workers:
sn46xxx
cryo10.ourdomain.edu
Here are the results of the suggested commands in the FAQ:
eval bin/cryosparcw env
export "CRYOSPARC_USE_GPU=true"
export "CRYOSPARC_CONDA_ENV=cryosparc_worker_env"
export "CRYOSPARC_DEVELOP=false"
export "CRYOSPARC_LICENSE_ID=bae9edd6-54dd-11ef-93a3-7b0d1eadc7e2"
export "CRYOSPARC_ROOT_DIR=/opt/cryosparc_worker"
export "CRYOSPARC_PATH=/opt/cryosparc_worker/bin"
export "PATH=/opt/cryosparc_worker/bin:/opt/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/opt/cryosparc_worker/deps/anaconda/condabin:/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin"
export "LD_LIBRARY_PATH="
export "LD_PRELOAD=/opt/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libpython3.10.so"
export "PYTHONPATH=/opt/cryosparc_worker"
export "PYTHONNOUSERSITE=true"
export "CONDA_SHLVL=1"
export "CONDA_PROMPT_MODIFIER=(cryosparc_worker_env)"
export "CONDA_EXE=/opt/cryosparc_worker/deps/anaconda/bin/conda"
export "CONDA_PREFIX=/opt/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env"
export "CONDA_PYTHON_EXE=/opt/cryosparc_worker/deps/anaconda/bin/python"
export "CONDA_DEFAULT_ENV=cryosparc_worker_env"
export "NUMBA_CUDA_MAX_PENDING_DEALLOCS_COUNT=0"
export "NUMBA_CUDA_INCLUDE_PATH=/opt/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include"
export "NUMBA_CUDA_USE_NVIDIA_BINDING=1"
env | grep PATH
PATH=/usr/lib/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/cryosparc_worker/bin:/root/bin
/sbin/ldconfig -p | grep -I cuda
libicudata.so.50 (libc6,x86-64) => /lib64/libicudata.so.50
libcudadebugger.so.1 (libc6,x86-64) => /lib64/libcudadebugger.so.1
libcuda.so.1 (libc6,x86-64) => /lib64/libcuda.so.1
libcuda.so.1 (libc6) => /lib/libcuda.so.1
libcuda.so (libc6,x86-64) => /lib64/libcuda.so
libcuda.so (libc6) => /lib/libcuda.so
uname -a
Linux sn4622115934 3.10.0-1160.88.1.el7.x86_64 #1 SMP Tue Mar 7 15:41:52 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
cryosparc_worker]# nvidia-smi
Thu Aug 22 13:00:14 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:4F:00.0 Off | Off |
| 30% 32C P8 8W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A6000 Off | 00000000:52:00.0 Off | Off |
| 30% 36C P8 14W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA RTX A6000 Off | 00000000:56:00.0 Off | Off |
| 30% 33C P8 16W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA RTX A6000 Off | 00000000:57:00.0 Off | Off |
| 30% 35C P8 14W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 NVIDIA RTX A6000 Off | 00000000:D1:00.0 Off | Off |
| 30% 33C P8 30W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 NVIDIA RTX A6000 Off | 00000000:D2:00.0 Off | Off |
| 30% 34C P8 24W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 NVIDIA RTX A6000 Off | 00000000:D5:00.0 Off | Off |
| 30% 37C P8 28W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 NVIDIA RTX A6000 Off | 00000000:D6:00.0 Off | Off |
| 30% 38C P0 78W / 300W | 0MiB / 49140MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
nvcc is available via
/opt/sbgrid/x86_64-linux/diffdock/73ef67f/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Also on the installation instructions page there’s a mention of registering with the master process: Connect a Cluster to CryoSPARC Once the cryosparc_worker package is installed, the cluster must be registered with the master process. This requires a template for job submission commands and scripts that the master process will use to submit jobs to the cluster scheduler. To register the cluster, provide CryoSPARC with the following two files and call the cryosparcm cluster conn) there’s a mention of registering with the master process:
Connect a Cluster to CryoSPARC
Once the
cryosparc_worker
package is installed, the cluster must be registered with the master process. This requires a template for job submission commands and scripts that the master process will use to submit jobs to the cluster scheduler.To register the cluster, provide CryoSPARC with the following two files and call the
cryosparcm cluster connect
command.
Is a job scheduler required to get a worker going?
Pardon the light obfuscation of the hostnames…