Startup error after changing ip

Greetings,

We had to change our headnote IP address due to some org changes at our university. All is working fine otherwise, but we are having an issue starting cryosparc.

After looking at other threads, I tried adding the following to the master config.sh:

export CRYOSPARC_MASTER_HOSTNAME=“MY HOSTNAME HEREr”
export CRYOSPARC_FORCE_HOSTNAME=true

However, I still get there errors:

Starting cryoSPARC System master process…
CryoSPARC is not already running.
configuring database
configuration complete
database: started
Warning: Could not get database status (attempt 1/3)
Warning: Could not get database status (attempt 2/3)
Warning: Could not get database status (attempt 3/3)
checkdb error - could not get replica set status; please reconfigure the database with cryosparcm configuredb
Traceback (most recent call last):
File “”, line 1, in
File “/opt/cryoem/cryosparc/cryosparc2_master/cryosparc_compute/database_management.py”, line 268, in check_mongo
admin_db = try_get_pymongo_admin_db(mongo_client)
File “/opt/cryoem/cryosparc/cryosparc2_master/cryosparc_compute/database_management.py”, line 249, in try_get_pymongo_admin_db
admin_db.command(({‘serverStatus’: 1}))
File “/opt/cryoem/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/database.py”, line 827, in command
with self.__client._socket_for_reads(read_preference, session) as (sock_info, secondary_ok):
File “/opt/cryoem/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/opt/cryoem/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1478, in _socket_for_reads
server = self._select_server(read_preference, session)
File “/opt/cryoem/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/mongo_client.py”, line 1436, in _select_server
server = topology.select_server(server_selector)
File “/opt/cryoem/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py”, line 250, in select_server
return random.choice(self.select_servers(selector, server_selection_timeout, address))
File “/opt/cryoem/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py”, line 211, in select_servers
server_descriptions = self._select_servers_loop(selector, server_timeout, address)
File “/opt/cryoem/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/pymongo/topology.py”, line 226, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: vision.structbio.pitt.edu:39001: timed out, Timeout: 20.0s, Topology Description: <TopologyDescription id: 653a6ef9ec0ad35fb2339b76, topology_type: Unknown, servers: [<ServerDescription (‘vision.structbio.pitt.edu’, 39001) server_type: Unknown, rtt: None, error=NetworkTimeout(‘vision.structbio.pitt.edu:39001: timed out’)>]>
[2023-10-26T09:53:04-0400] Error checking database. Most recent database log lines:
2023-10-26T09:51:51.667-0400 I REPL [replexec-0] Starting replication reporter thread
2023-10-26T09:51:51.667-0400 I REPL [rsSync] transition to SECONDARY from RECOVERING
2023-10-26T09:51:51.668-0400 I REPL [rsSync] conducting a dry run election to see if we could be elected. current term: 79
2023-10-26T09:51:51.668-0400 I REPL [replexec-0] dry election run succeeded, running for election in term 80
2023-10-26T09:51:51.680-0400 I REPL [replexec-1] election succeeded, assuming primary role in term 80
2023-10-26T09:51:51.680-0400 I REPL [replexec-1] transition to PRIMARY from SECONDARY
2023-10-26T09:51:51.680-0400 I REPL [replexec-1] Resetting sync source to empty, which was :27017
2023-10-26T09:51:51.680-0400 I REPL [replexec-1] Entering primary catch-up mode.
2023-10-26T09:51:51.680-0400 I REPL [replexec-1] Exited primary catch-up mode.
2023-10-26T09:51:53.669-0400 I REPL [rsSync] transition to primary complete; database writes are now permitted

Can you successfully SSH into the master from the worker, as the user that runs CryoSPARC? Often after an IP address change you’ll need to update known_hosts before SSH connections can proceed.

@yodamoppet Please post the output of these commands in a fresh shell (on the CryoSPARC master)

eval $(cryosparcm env)
ps -eo pid,ppid,start,cmd | grep -e cryosparc -e mongo 
curl 127.0.0.1:39001
curl ${CRYOSPARC_MASTER_HOSTNAME}:39001

[Edited after at first missing

]

@wtempel

Appreciate your response.

I believe I got this sorted…

It looks like the DNS record hadn’t expired yet, so an nslookup still produced the old IP. Now that it has refreshed, cryosparc starts up as expected.

2 Likes

@wmtempel

This isn’t quite resolved yet.

So, cryosparc starts up now, but jobs that are submitted don’t enter the slurm queue.

Error looks like:

-------- Submission command:
sbatch /tank/conwaylab/conway/cryosparc/CS-2023-09-10-mcv-lt-multi-krsf4ecc250ef165kx-50ea2-eer/J15/queue_sub_script.sh
-------- Cluster Job ID:
48756
-------- Queued on cluster at 2023-10-26 14:29:01.887048--------
Cluster job status at 2023-10-26 14:35:47.890631 (40 retries)
JOBID PARTITION NAME USER STATE TIME TIME_LIMI NODES NODELIST(REASON)

The job doesn’t enter the queue, and cryosprc keeps retrying.

What information can I provide to help troubleshoot?

It seems the job is bounced off the queue soon after being sbatched. Is there any information inside job.log or any alternative stdout or stderr files that may have been configured with
#SBATCH -o or #SBATCH -e?

@wmtempel

It seems that this has started working. I’m not sure why, I didn’t change anything over the weekend, but cryosparc jobs started entering the queue and running normally.

Thanks for your advice though, we appreciate it.