Hi,
I am trying to get cryosparc2 working on a cluster with slurm. Job gets submitted but it is not being executed.
Launching job on lane TEST_CLUSTER target TEST_CLUSTER ...
License is valid.
Launching job on cluster TEST_CLUSTER
====================== Cluster submission script: ========================
==========================================================================
#!/bin/bash
#SBATCH --job-name=cryosparc_P1_J8
#SBATCH --partition=sbatch
#SBATCH --output=/data/cryosparc_user/projects/cryosparc2/example/T20S/P1/J8/job.log
#SBATCH --error=/data/cryosparc_user/projects/cryosparc2/example/T20S/P1/J8/job.log
#SBATCH --nodes=1
#SBATCH --mem=16000M
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=6
#SBATCH --gres=gpu:1
#SBATCH --gres-flags=enforce-binding
srun /data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/bin/cryosparcw run --project P1 --job J8 --master_hostname cryosparc --master_command_core_port 39002 > /data/cryosparc_user/projects/cryosparc2/example/T20S/P1/J8/job.log 2>&1
==========================================================================
==========================================================================
-------- Submission command:
sbatch /data/cryosparc_user/projects/cryosparc2/example/T20S/P1/J8/queue_sub_script.sh
-------- Cluster Job ID:
5991
-------- Queued at 2019-06-18 15:32:19.844864
-------- Job status at 2019-06-18 15:32:19.866773
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
5991 sbatch cryospar cryosparc_user PD 0:00 1 (None)
Here is the out.log:
================= CRYOSPARCW ======= 2019-06-18 15:32:20.445241 =========
Project P1 Job J8
Master cryosparc Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 17654
*** client.py: command (http://cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 1 of 3
*** client.py: command (http://cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 2 of 3
*** client.py: command (http://cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 3 of 3
Traceback (most recent call last):
File "<string>", line 1, in <module>
Process Process-1:
Traceback (most recent call last):
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/process.py", line 267, in _bootstrap
File "cryosparc2_worker/cryosparc2_compute/run.py", line 148, in cryosparc2_compute.run.run (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/run.c:5181)
self.run()
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "cryosparc2_worker/cryosparc2_compute/run.py", line 31, in cryosparc2_compute.run.main (/home/installtest/deps_manage/cryosparc2_package/deploy/stage/cryosparc2_worker/cryosparc2_compute/run.c:2121)
File "cryosparc2_compute/jobs/runcommon.py", line 70, in connect
File "cryosparc2_compute/jobs/runcommon.py", line 70, in connect
cli = client.CommandClient(master_hostname, int(master_command_core_port))
File "cryosparc2_compute/client.py", line 33, in __init__
self._reload()
File "cryosparc2_compute/client.py", line 61, in _reload
system = self._get_callable('system.describe')()
File "cryosparc2_compute/client.py", line 49, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers = header, timeout=self.timeout)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py", line 116, in post
cli = client.CommandClient(master_hostname, int(master_command_core_port))
File "cryosparc2_compute/client.py", line 33, in __init__
return request('post', url, data=data, json=json, **kwargs)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py", line 60, in request
self._reload()
File "cryosparc2_compute/client.py", line 61, in _reload
return session.request(method=method, url=url, **kwargs)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
system = self._get_callable('system.describe')()
File "cryosparc2_compute/client.py", line 49, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers = header, timeout=self.timeout)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py", line 116, in post
resp = self.send(prep, **send_kwargs)
return request('post', url, data=data, json=json, **kwargs)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
r = adapter.send(request, **kwargs)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
resp = self.send(prep, **send_kwargs)
raise ConnectionError(e, request=request)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
requests.exceptions.ConnectionError: HTTPConnectionPool(host='cryosparc', port=39002): Max retries exceeded with url: /api (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ae02da26150>: Failed to establish a new connection: [Errno 113] No route to host',))
r = adapter.send(request, **kwargs)
File "/data/cryosparc_user/progs/cryosparc2/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='cryosparc', port=39002): Max retries exceeded with url: /api (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x2ae02da27190>: Failed to establish a new connection: [Errno 113] No route to host',))
*** client.py: command (http://cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 1 of 3
*** client.py: command (http://cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 2 of 3
*** client.py: command (http://cryosparc:39002/api) did not reply within timeout of 300 seconds, attempt 3 of 3
srun: error: gpu2: task 0: Exited with exit code 1