Sarulthasan,
master and worker are on the same node and FW is disabled.
Users are reporting that their jobs are running.
cryosparc_user@tulasi:~> telnet 127.0.0.1 39005
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
Please let me know what other information I can gather.
thanks,
Brian
tulasi:~ # su - cryosparc_user
cryosparc_user@tulasi:~> cryosparcm log command_vis
return session.request(method=method, url=url, **kwargs)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='tulasi.wadsworth.org', port=39005): Max retries exceeded with url: /api (Caused by)
*** client.py: command (http://tulasi.wadsworth.org:39005/api) did not reply within timeout of 300 seconds, attempt 1 of 3
*** client.py: command (http://tulasi.wadsworth.org:39005/api) did not reply within timeout of 300 seconds, attempt 2 of 3
*** client.py: command (http://tulasi.wadsworth.org:39005/api) did not reply within timeout of 300 seconds, attempt 3 of 3
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "cryosparc2_command/command_vis/__init__.py", line 63, in <module>
rtp = CommandClient(os.environ['CRYOSPARC_MASTER_HOSTNAME'], int(os.environ['CRYOSPARC_COMMAND_RTP_PORT']))
File "cryosparc2_compute/client.py", line 33, in __init__
self._reload()
File "cryosparc2_compute/client.py", line 61, in _reload
system = self._get_callable('system.describe')()
File "cryosparc2_compute/client.py", line 49, in func
r = requests.post(self.url, data = json.dumps(data, cls=NumpyEncoder), headers = header, timeout=self.timeout)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/cryosparc_user/cryosparc_master/deps/anaconda/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='tulasi.wadsworth.org', port=39005): Max retries exceeded with url: /api (Caused by)
*** client.py: command (http://tulasi.wadsworth.org:39005/api) did not reply within timeout of 300 seconds, attempt 1 of 3
*** client.py: command (http://tulasi.wadsworth.org:39005/api) did not reply within timeout of 300 seconds, attempt 2 of 3
*** client.py: command (http://tulasi.wadsworth.org:39005/api) did not reply within timeout of 300 seconds, attempt 3 of 3
I just restarted a browser to authenticate through site FW, so if this partial command_core log shows license issues I think they are safe to ignore.
failed to connect link
failed to connect link
[EXPORT_JOB] : Request to export P15 J241
[EXPORT_JOB] : Exporting job to /usr16/data/rzk01/cryos2/P2/P15/J241
[EXPORT_JOB] : Exporting all of job's images in the database to /usr16/data/rzk01/cryos2/P2/P15/J241/gridfs_data...
[EXPORT_JOB] : Writing 153 database images to /usr16/data/rzk01/cryos2/P2/P15/J241/gridfs_data/gridfsdata_0
[EXPORT_JOB] : Done. Exported 153 images in 0.42s
[EXPORT_JOB] : Exporting all job's streamlog events...
[EXPORT_JOB] : Done. Exported 1 files in 0.01s
[EXPORT_JOB] : Exporting job metafile...
[EXPORT_JOB] : Creating .csg file for particles
[EXPORT_JOB] : Creating .csg file for volume
[EXPORT_JOB] : Creating .csg file for mask
[EXPORT_JOB] : Done. Exported in 0.04s
[EXPORT_JOB] : Updating job manifest...
[EXPORT_JOB] : Done. Updated in 0.00s
[EXPORT_JOB] : Exported P15 J241 in 0.48s
Changed job P15.J241 status completed
---------- Scheduler running ---------------
Lane default node : Jobs Queued (nonpaused, inputs ready): [u'J242']
Total slots: {u'tulasi': {u'GPU': set([0, 1, 2, 3]), u'RAM': set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]), u'CPU': se}
Available slots: {u'tulasi': {u'GPU': set([0, 1, 2, 3]), u'RAM': set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]), u'CPU'}
Available licen: 10000
Now trying to schedule J242
Need slots : {u'GPU': 1, u'RAM': 3, u'CPU': 4}
Need fixed : {u'SSD': True}
Need licen : True
Master direct : False
Trying to schedule on tulasi
Launchable: True
Alloc slots : {u'GPU': [0], u'RAM': [0, 1, 2], u'CPU': [0, 1, 2, 3]}
Alloc fixed : {u'SSD': True}
Alloc licen : True
-- Launchable! -- Launching.
---- Running project UID P15 job UID J242
failed to connect link
Error connecting to cryoSPARC license server. Checking local license file.
License Data: {"token": "xxxxxxx", "token_valid": true, "request_date": 1571774898, "license_valid": true}
License Signature:
Running job on worker type node
Running job using: /home/cryosparc_user/cryosparc_worker/bin/cryosparcw
Running job on remote worker node hostname tulasi
cmd: bash -c "nohup /home/cryosparc_user/cryosparc_worker/bin/cryosparcw run --project P15 --job J242 --master_hostname tulasi.wadsworth.org --master_command_core_port 39002 > /usr16/data/rzk0"
Changed job P15.J242 status launched
---------- Scheduler done ------------------
Changed job P15.J242 status started
Changed job P15.J242 status running
failed to connect link
failed to connect link
failed to connect link
[EXPORT_JOB] : Request to export P15 J242
[EXPORT_JOB] : Exporting job to /usr16/data/rzk01/cryos2/P2/P15/J242
[EXPORT_JOB] : Exporting all of job's images in the database to /usr16/data/rzk01/cryos2/P2/P15/J242/gridfs_data...
[EXPORT_JOB] : Writing 109 database images to /usr16/data/rzk01/cryos2/P2/P15/J242/gridfs_data/gridfsdata_0
[EXPORT_JOB] : Done. Exported 109 images in 0.34s
[EXPORT_JOB] : Exporting all job's streamlog events...
[EXPORT_JOB] : Done. Exported 1 files in 0.01s
[EXPORT_JOB] : Exporting job metafile...
[EXPORT_JOB] : Creating .csg file for particles
[EXPORT_JOB] : Creating .csg file for volume
[EXPORT_JOB] : Creating .csg file for mask
[EXPORT_JOB] : Done. Exported in 0.04s
[EXPORT_JOB] : Updating job manifest...
[EXPORT_JOB] : Done. Updated in 0.07s
[EXPORT_JOB] : Exported P15 J242 in 0.46s
Changed job P15.J242 status completed
failed to connect link