Hey, Stephan, thank you for replying, it doesn’t work, however…
Let me describe my process first:
I want to install the software on a remote cluster: graham.computecanada.ca (which is a cluster of ComputeCanada). The first problem is that every time I use hostname user@graham.computecanada.ca to login in, then in the system, the hostname will become either gra-login3.graham.sharcnet or gra-login2.graham.sharcnet or gra-login1.graham.sharcnet. It seems to randomly change. Therefore, I still use the graham.computecanada.ca as the hostname for both worker-hostname and master host-name.
This is the second installation, in the first installation, the cryosparcm start works well initially. But when i excute cryosparcm cluster dump , it reminds that no such cluster existed. I tried many differnt name that i can found about the cluster, but still doesn’t work. Therefore, I tried to cryosparcm cluster connect directly and manuly edit the .json file. After this step, I can’t restart the cryosparcm. So I delete all cryosparc file and reinstall the software.
Now when I cryosparcm restart in the worker folder, there’s an error:
CryoSPARC is running.
Stopping cryosparc.
unix:///tmp/cryosparc-supervisor-f97bde01964489ba6e140782f612b326.sock refused connection
ERROR: unix:///tmp/cryosparc-supervisor-f97bde01964489ba6e140782f612b326.sock refused connection (already shut down?)
Starting cryoSPARC System master process..
CryoSPARC is already running.
If you would like to restart, use cryosparcm restart
Here’s the cryosparcm status:
Current cryoSPARC version: v2.14.2
cryosparcm process status:
unix:///tmp/cryosparc-supervisor-f97bde01964489ba6e140782f612b326.sock refused connection
global config variables:
export CRYOSPARC_LICENSE_ID=“xxxx”
export CRYOSPARC_MASTER_HOSTNAME=“xxxx”
export CRYOSPARC_DB_PATH=“xxxx”
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false
export CRYOSPARC_CLICK_WRAP=true
Here’s the cryosparcm log command_core
Scheduler Failed
Heartbeat check failed
[JSONRPC ERROR 2020-05-03 12:10:43.656686 at get_num_active_licenses ]
Traceback (most recent call last):
File “cryosparc2_command/command_core/init.py”, line 114, in wrapper
res = func(*args, **kwargs)
File “cryosparc2_command/command_core/init.py”, line 1421, in get_num_active_licenses
for j in jobs_running:
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py”, line 1114, in next
if len(self.__data) or self._refresh():
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py”, line 1036, in _refresh
self.__collation))
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py”, line 928, in __send_message
helpers._check_command_response(doc[‘data’][0])
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/helpers.py”, line 210, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
OperationFailure: node is not in primary or recovering state
Traceback (most recent call last):
File “cryosparc2_command/command_core/init.py”, line 198, in background_worker
concurrent_job_monitor()
File “cryosparc2_command/command_core/init.py”, line 1428, in concurrent_job_monitor
current_concurrent_licenses_deque.append(get_num_active_licenses())
File “cryosparc2_command/command_core/init.py”, line 123, in wrapper
raise e
OperationFailure: node is not in primary or recovering state
Traceback (most recent call last):
File “cryosparc2_command/command_core/init.py”, line 203, in background_worker
heartbeat_manager()
File “cryosparc2_command/command_core/init.py”, line 1472, in heartbeat_manager
active_jobs = get_active_licenses()
File “cryosparc2_command/command_core/init.py”, line 1437, in get_active_licenses
for j in jobs_running:
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py”, line 1114, in next
if len(self.__data) or self._refresh():
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py”, line 1036, in _refresh
self.__collation))
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py”, line 928, in __send_message
helpers._check_command_response(doc[‘data’][0])
File “/home/pangguot/cryosparc/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/helpers.py”, line 210, in _check_command_response
raise OperationFailure(msg % errmsg, code, response)
OperationFailure: node is not in primary or recovering state
Then it repeats to show this information.
Here’s the cryosparcm log database
2020-05-03T12:17:10.461-0400 I NETWORK [thread1] connection accepted from 199.241.166.2:37002 #5561 (6 connections now open)
2020-05-03T12:17:10.461-0400 I NETWORK [conn5561] received client metadata from 199.241.166.2:37002 conn5561: { driver: { name: “nodejs”, version: “2.2.34” }, os: { type: “Linux”, name: “linux”, architecture: “x64”, version: “3.10.0-957.12.2.el7.x86_64” }, platform: “Node.js v8.9.4, LE, mongodb-core: 2.1.18” }
2020-05-03T12:17:10.485-0400 I - [conn5559] end connection 199.241.166.2:36998 (6 connections now open)
2020-05-03T12:17:10.485-0400 I - [conn5560] end connection 199.241.166.2:37000 (6 connections now open)
2020-05-03T12:17:10.485-0400 I - [conn5561] end connection 199.241.166.2:37002 (6 connections now open)
2020-05-03T12:17:11.883-0400 I NETWORK [thread1] connection accepted from 199.241.166.2:37004 #5562 (4 connections now open)
2020-05-03T12:17:11.883-0400 I NETWORK [conn5562] received client metadata from 199.241.166.2:37004 conn5562: { driver: { name: “PyMongo”, version: “3.4.0” }, os: { type: “Linux”, name: “CentOS Linux 7.5.1804 Core”, architecture: “x86_64”, version: “3.10.0-957.12.2.el7.x86_64” }, platform: “CPython 2.7.15.final.0” }
2020-05-03T12:17:11.948-0400 I - [conn5562] end connection 199.241.166.2:37004 (4 connections now open)
2020-05-03T12:17:12.026-0400 I NETWORK [thread1] connection accepted from 199.241.166.2:37014 #5563 (4 connections now open)
2020-05-03T12:17:12.031-0400 I NETWORK [conn5563] received client metadata from 199.241.166.2:37014 conn5563: { driver: { name: “nodejs”, version: “2.2.34” }, os: { type: “Linux”, name: “linux”, architecture: “x64”, version: “3.10.0-957.12.2.el7.x86_64” }, platform: “Node.js v8.9.4, LE, mongodb-core: 2.1.18” }