Pymongo server selection issues after update

HamishGBrown · December 10, 2020, 2:10am

Hi,

Running cryosparc on RHEL 6.8 (Santiago) with CUDA 9.2. I updated to v 3.0.0 this morning, seemed to go smoothly. When I run cryosparcm restart I get:

CryoSPARC is running.
Stopping cryosparc 
database: stopped
Shut down
Starting cryoSPARC System master process..
CryoSPARC is not already running.
database: started
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1915, in list_database_names
    for doc in self.list_databases(session, nameOnly=True)]
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1896, in list_databases
    res = admin._retryable_read_command(cmd, session=session)
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/database.py", line 756, in _retryable_read_command
    _cmd, read_preference, session)
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1461, in _retryable_read
    read_pref, session, address=address)
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1278, in _select_server
    server = topology.select_server(server_selector)
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/topology.py", line 243, in select_server
    address))
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/topology.py", line 200, in select_servers
    selector, server_timeout, address)
  File "/home/bio21em1/cryosparc2/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/topology.py", line 217, in _select_servers_loop
    (self._error_message(selector), timeout, self.description))
pymongo.errors.ServerSelectionTimeoutError: No servers found yet, Timeout: 20.0s, Topology Description: <TopologyDescription id: 5fd1827596b1c64f92c416fe, topology_type: Single, servers: [<ServerDescription ('localhost', 39001) server_type: Unknown, rtt: None>]>

Running cryosparcm status tells me that database is running but no other cryosparcm processes have started.

I’ve tried deleting mongod.lock in the database folder and the .sock file in /tmp/ as suggested in posts on similar issues but this doesn’t work.

After running cryosparcm restart running lsof -i -P -n indicates that mongod is listening to port 39001, and running cryosparcm stop successfully frees up port 39001.

Any help with this issue would be much appreciated.

stephan · December 10, 2020, 5:24pm

Hi @HamishGBrown,

Thanks for reporting. Is it possible if you can send us the database logs? cryosparcm log database

HamishGBrown · December 10, 2020, 9:39pm

Thanks for your response @Stephan. I restarted cryosparc again this morning (8:24 AM December 11), got the same pymongo error as before, and have included the log file below. There was nothing in it pertaining to my pymongo crash, only messages related to the database process running overnight from when I previously started cryosparc (when I also got the same pymongo error). I get the impression that the database is starting successfully but then a follow-up process calls a pymongo routine which crashes. I can start many of the subsequent processes manually by executing commands like cryosparcm start command_core etc. but even if I can get many of them to start a properly running cryosparc instance does not eventuate. If you can offer any insight that would be much appreciated, it’s worth mentioning that our Linux distribution is of an older version and I’ve had issues installing newer software on it - since RHEL 6.8 doesn’t come with GLIBC 2.14 out of the box I had to manually add it to the path variables to get command_core running for example.

2020-12-10T21:22:53.876+0000 I STORAGE  [signalProcessingThread] shutdown: removing fs lock...
2020-12-10T21:22:53.876+0000 I CONTROL  [signalProcessingThread] now exiting
2020-12-10T21:22:53.876+0000 I CONTROL  [signalProcessingThread] shutting down with code:0
2020-12-10T21:22:53.876+0000 I CONTROL  [initandlisten] shutting down with code:0
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten] MongoDB starting : pid=27931 port=39001 dbpath=/home/bio21em1/cryosparc2/cryosparc2_database 64-bit host=thor
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten] db version v3.4.10
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten] git version: 078f28920cb24de0dd479b5ea6c66c644f6326e9
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten] modules: none
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten] build environment:
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten]     distarch: x86_64
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2020-12-10T21:22:55.824+0000 I CONTROL  [initandlisten] options: { net: { port: 39001 }, replication: { oplogSizeMB: 64, replSet: "meteor" }, storage: { dbPath: "/home/bio21em1/cryosparc2/cryosparc2_database", journal: { enabled: false }, wiredTiger: { engineConfig: { cacheSizeGB: 4.0 } } } }
2020-12-10T21:22:55.844+0000 I -        [initandlisten] Detected data files in /home/bio21em1/cryosparc2/cryosparc2_database created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2020-12-10T21:22:55.844+0000 I STORAGE  [initandlisten] 
2020-12-10T21:22:55.844+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-12-10T21:22:55.845+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2020-12-10T21:22:55.845+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=4096M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),,log=(enabled=false),
2020-12-10T21:22:56.155+0000 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2020-12-10T21:22:56.156+0000 I STORAGE  [initandlisten] The size storer reports that the oplog contains 7048 records totaling to 676562 bytes
2020-12-10T21:22:56.156+0000 I STORAGE  [initandlisten] Sampling from the oplog between Dec 10 00:22:26:1 and Dec 10 21:22:46:1 to determine where to place markers for truncation
2020-12-10T21:22:56.156+0000 I STORAGE  [initandlisten] Taking 1 samples and assuming that each section of oplog contains approximately 69910 records totaling to 6710903 bytes
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] 
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] 
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] 
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] 
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2020-12-10T21:22:56.221+0000 I CONTROL  [initandlisten] 
2020-12-10T21:22:56.225+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/home/bio21em1/cryosparc2/cryosparc2_database/diagnostic.data'
2020-12-10T21:22:56.226+0000 I NETWORK  [thread1] waiting for connections on port 39001
2020-12-10T21:22:56.226+0000 I REPL     [replExecDBWorker-0] New replica set config in use: { _id: "meteor", version: 1, protocolVersion: 1, members: [ { _id: 0, host: "localhost:39001", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5fd16a41e106ab01ce4fb0d7') } }
2020-12-10T21:22:56.226+0000 I REPL     [replExecDBWorker-0] This node is localhost:39001 in the config
2020-12-10T21:22:56.226+0000 I REPL     [replExecDBWorker-0] transition to STARTUP2
2020-12-10T21:22:56.226+0000 I REPL     [replExecDBWorker-0] Starting replication storage threads
2020-12-10T21:22:56.226+0000 I REPL     [replExecDBWorker-0] Starting replication fetcher thread
2020-12-10T21:22:56.226+0000 I REPL     [replExecDBWorker-0] Starting replication applier thread
2020-12-10T21:22:56.226+0000 I REPL     [replExecDBWorker-0] Starting replication reporter thread
2020-12-10T21:22:56.226+0000 I REPL     [rsSync] transition to RECOVERING
2020-12-10T21:22:56.227+0000 I REPL     [rsSync] transition to SECONDARY
2020-12-10T21:22:56.227+0000 I REPL     [rsSync] conducting a dry run election to see if we could be elected
2020-12-10T21:22:56.227+0000 I REPL     [ReplicationExecutor] dry election run succeeded, running for election
2020-12-10T21:22:56.420+0000 I REPL     [ReplicationExecutor] election succeeded, assuming primary role in term 19
2020-12-10T21:22:56.420+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
2020-12-10T21:22:56.420+0000 I REPL     [ReplicationExecutor] Entering primary catch-up mode.
2020-12-10T21:22:56.420+0000 I REPL     [ReplicationExecutor] Exited primary catch-up mode.
2020-12-10T21:22:58.228+0000 I REPL     [rsSync] transition to primary complete; database writes are now permitted

HamishGBrown · December 19, 2020, 10:12am

Hi @stephan, any further suggestions on this issue?

HamishGBrown · December 23, 2020, 1:06am

I’ve narrowed it down to this line failing in cryosparcm:

python -c "from pymongo import MongoClient; client=MongoClient(host='localhost', port=$CRYOSPARC_MONGO_PORT, serverSelectionTimeoutMS=20000); client.list_database_names();"

apunjani · December 23, 2020, 3:44pm

Hi @HamishGBrown,
Our best guess at the moment is that this issue is also related to RHEL6.8 compatibility. Since RHEL 6 is end of life, v3 is compiled and built on CentOS 7 now and we won’t be able to test or work on fixes for older OS versions. It may be best to downgrade to 2.15 for now.

HamishGBrown · January 21, 2021, 10:27pm

Hi @apunjani, To follow up I’ve successfully installed cryosparc 3.0.1 after upgrading the operating system to RHEL 7.9.