MongoDB: "OperationFailure: node is not in primary or recovering state" when changing port

jiskander · April 27, 2021, 5:56am

I need to create a new instance of cryosparc (on a different port) that uses a copy of my current db.
When I copy the database and start using a new port , I get the following erro

[JSONRPC ERROR  2021-04-27 14:31:44.724015  at  get_num_active_licenses ]
-----------------------------------------------------
Traceback (most recent call last):
  File "cryosparc2_command/command_core/__init__.py", line 115, in wrapper
    res = func(*args, **kwargs)
  File "cryosparc2_command/command_core/__init__.py", line 1520, in get_num_active_licenses
    for j in jobs_running:
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py", line 1114, in next
    if len(self.__data) or self._refresh():
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py", line 1036, in _refresh
    self.__collation))
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py", line 928, in __send_message
    helpers._check_command_response(doc['data'][0])
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/helpers.py", line 210, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
OperationFailure: node is not in primary or recovering state
-----------------------------------------------------
Traceback (most recent call last):
  File "cryosparc2_command/command_core/__init__.py", line 200, in background_worker
    concurrent_job_monitor()
  File "cryosparc2_command/command_core/__init__.py", line 1527, in concurrent_job_monitor
    current_concurrent_licenses_deque.append(get_num_active_licenses())
  File "cryosparc2_command/command_core/__init__.py", line 124, in wrapper
    raise e
OperationFailure: node is not in primary or recovering state
Traceback (most recent call last):
  File "cryosparc2_command/command_core/__init__.py", line 205, in background_worker
    heartbeat_manager()
  File "cryosparc2_command/command_core/__init__.py", line 1571, in heartbeat_manager
    active_jobs = get_active_licenses()
  File "cryosparc2_command/command_core/__init__.py", line 1536, in get_active_licenses
    for j in jobs_running:
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py", line 1114, in next
    if len(self.__data) or self._refresh():
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py", line 1036, in _refresh
    self.__collation))
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/cursor.py", line 928, in __send_message
    helpers._check_command_response(doc['data'][0])
  File "/stornext/System/data/structbio/cryosparc/cryosparc-v2/cryosparc2_master/deps/anaconda/lib/python2.7/site-packages/pymongo/helpers.py", line 210, in _check_command_response
    raise OperationFailure(msg % errmsg, code, response)
OperationFailure: node is not in primary or recovering state
****** Concurrent job monitor failed ****
****** Instance heartbeat failed ****

stephan · April 28, 2021, 5:41pm

Hi @jiskander,

The first thing you should do is keep a backup (copy) of your database at a different location.

The next thing is to re-configure the replica set:

First, make sure there are no other cryoSPARC instances running on the system
Navigate to the new cryoSPARC instance’s cryosparc_master folder
Then, start the new instance’s database: ./bin/cryosparcm start database

Then, open up a mongo shell and get information about the replica set by running the commands:

./bin/cryosparcm mongo
rs.conf()

You should see something like this:

"members" : [
            {
                    "_id" : 0,
                    "host" : "localhost:63501",
                    "arbiterOnly" : false,
                    "buildIndexes" : true,
                    "hidden" : false,
                    .
                    .
                    .

If that port number is the old port, run the following command:

rs.reconfig({
    "_id": "meteor",
    "members": [
        {
            "_id": 0,
            "host": "localhost:<NEW_BASE_PORT_NUMBER+1>"
        }
    ]
    })

shockacone · June 7, 2021, 6:21pm

Hi @stephan
The following is my error:

Attempt 3/3 to GET http://svlpcryosparc01.stjude.org:39402/startup failed with exception: 500 Server 
Error: INTERNAL SERVER ERROR for url: http://svlpcryosparc01.stjude.org:39402/startup
Failed to GET http://svlpcryosparc01.stjude.org:39402/startup

When I tried reconfiguring the replica set as you mentioned above I got the following error message:

rs.reconfig({ "_id": "meteor", "members": [ { "_id": 0, "host": "localhost:39400" } ] })
{
"ok" : 0,
"errmsg" : "replSetReconfig should only be run on PRIMARY, but my state is REMOVED; use the 
\"force\" argument to override",
"code" : 10107,
"codeName" : "NotMaster"
}

rs.reconfig({ "_id": "meteor", "members": [ { "_id": 0, "host": "svlpcryosparc01.stjude.org:39400"}]},{ 
"force" : true })
{
"ok" : 0,
"errmsg" : "No host described in new configuration 48847 for replica set meteor maps to this 
node",
"code" : 103,
"codeName" : "NewReplicaSetConfigurationIncompatible"
}

Please let me know how I can solve this issue. I’m trying to reuse an older database with a newer installation on a new host and change the ports to 39400-39410 because I’m running multiple instances on the same host. Thanks

stephan · August 25, 2021, 5:02pm

Hi @shockacone,

First, can you turn off cryosparc: cryosparcm stop
Then, if possible, restart your workstation.
If restarting your workstation is not possible, ensure all cryoSPARC processes are killed:

ps -ax | grep “supervisord” (kill only the process that is running from your cryosparc install)
ps -ax | grep “cryosparc” (kill all the matching processes related to your cryosparc instance)
ps -ax | grep “mongod” (kill only the process running your cryosparc database)

e.g. kill 82681

Once that’s done, ensure the CRYOSPARC_BASE_PORT value in cryosparc_master/config.sh is correct (39400 in your case).
Then, turn on cryoSPARC: cryosparcm start.

If there is an error, post the output of cryosparcm log command_core and cryosparcm log database

shockacone · August 25, 2021, 5:25pm

Hi @stephan my issue was resolved using the following solution: