Hi @wtempel , thank you for the prompt reply!
I am using the same CryoSPARC installation and version, so there is no database migration among different versions. When I say “previous session”, I mean a previous, terminated run of CryoSPARC, where all the *.wt, *.lock and *.log files are stored. I want to restart CryoSPARC using this non-empty directory instead of creating a new one. I am not user of CryoSPARC, but I have been told by researchers that sometimes they need to run CryoSPARC from a directory with previously generated database and configuration files available for reuse.
After I start CryoSPARC from the preexisting directory, this is the output in database.log
:
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] MongoDB starting : pid=2927759 port=18884 dbpath=/home/npavlovikj/cryosparc 64-bit host=2420
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] db version v3.6.23
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] git version: d352e6a4764659e0d0350ce77279de3c1f243e5c
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] allocator: tcmalloc
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] modules: none
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] build environment:
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] distarch: x86_64
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] target_arch: x86_64
2024-04-04T15:46:20.470-0500 I CONTROL [initandlisten] options: { net: { port: 18884 }, replication: { oplogSizeMB: 64, replSet: "meteor" }, storage: { dbPath: "/home/npavlovikj/cryosparc" } }
2024-04-04T15:46:20.470-0500 W - [initandlisten] Detected unclean shutdown - /home/npavlovikj/cryosparc/mongod.lock is not empty.
2024-04-04T15:46:20.471-0500 I - [initandlisten] Detected data files in /home/npavlovikj/cryosparc created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2024-04-04T15:46:20.471-0500 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2024-04-04T15:46:20.472-0500 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=95258M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,stat
istics=(fast),compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress
),
2024-04-04T15:46:21.220-0500 I STORAGE [initandlisten] WiredTiger message [1712263581:220095][2927759:0x1540141de500], txn-recover: Main recovery loop: starting at 4/777600
2024-04-04T15:46:21.220-0500 I STORAGE [initandlisten] WiredTiger message [1712263581:220754][2927759:0x1540141de500], txn-recover: Recovering log 4 through 5
2024-04-04T15:46:21.271-0500 I STORAGE [initandlisten] WiredTiger message [1712263581:271054][2927759:0x1540141de500], file:collection-10-922672420683722901.wt, txn-recover: Recovering log 5 through 5
2024-04-04T15:46:21.312-0500 I STORAGE [initandlisten] WiredTiger message [1712263581:312758][2927759:0x1540141de500], file:collection-10-922672420683722901.wt, txn-recover: Set global recovery timestamp: 0
2024-04-04T15:46:21.323-0500 I STORAGE [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2024-04-04T15:46:21.323-0500 I STORAGE [initandlisten] The size storer reports that the oplog contains 364 records totaling to 2560655 bytes
2024-04-04T15:46:21.323-0500 I STORAGE [initandlisten] Scanning the oplog to determine where to place markers for truncation
2024-04-04T15:46:21.328-0500 I STORAGE [initandlisten] WiredTiger record store oplog processing took 4ms
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten]
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** WARNING: Access control is not enabled for the database.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** Read and write access to data and configuration is unrestricted.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten]
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** WARNING: This server is bound to localhost.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** Remote systems will be unable to connect to this server.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** Start the server with --bind_ip <address> to specify which IP
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** addresses it should serve responses from, or with --bind_ip_all to
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** bind to all interfaces. If this behavior is desired, start the
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** server with --bind_ip 127.0.0.1 to disable this warning.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten]
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten]
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** WARNING: You are running on a NUMA machine.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** We suggest launching mongod like this to avoid performance problems:
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** numactl --interleave=all mongod [other options]
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten]
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten]
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 4096 processes, 65536 files. Number of processes should be at least 32768 : 0.5 times number of files.
2024-04-04T15:46:21.329-0500 I CONTROL [initandlisten]
2024-04-04T15:46:21.354-0500 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/home/npavlovikj/cryosparc/diagnostic.data'
2024-04-04T15:46:21.360-0500 I REPL [initandlisten] Rollback ID is 1
2024-04-04T15:46:21.363-0500 I REPL [initandlisten] No oplog entries to apply for recovery. appliedThrough and checkpointTimestamp are both null.
2024-04-04T15:46:21.363-0500 I NETWORK [initandlisten] listening via socket bound to 127.0.0.1
2024-04-04T15:46:21.363-0500 I NETWORK [initandlisten] listening via socket bound to /tmp/mongodb-18884.sock
2024-04-04T15:46:21.363-0500 I NETWORK [initandlisten] waiting for connections on port 18884
2024-04-04T15:46:21.363-0500 W NETWORK [replexec-0] Failed to connect to 127.0.0.1:17142, in(checking socket for error after poll), reason: Connection refused
2024-04-04T15:46:21.363-0500 W REPL [replexec-0] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound: No host
described in new configuration 1 for replica set meteor maps to this node" while validating { _id: "meteor", version: 1, protocolVersion: 1, members: [ { _id: 0, host: "localhost:17142", arbiterOnly: false, buildI
ndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpT
imeoutMillis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('660f10355478bad9946224a4') } }
2024-04-04T15:46:21.363-0500 I REPL [replexec-0] New replica set config in use: { _id: "meteor", version: 1, protocolVersion: 1, members: [ { _id: 0, host: "localhost:17142", arbiterOnly: false, buildIndexes:
true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMi
llis: -1, catchUpTakeoverDelayMillis: 30000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('660f10355478bad9946224a4') } }
2024-04-04T15:46:21.363-0500 I REPL [replexec-0] This node is not a member of the config
2024-04-04T15:46:21.363-0500 I REPL [replexec-0] transition to REMOVED from STARTUP
2024-04-04T15:46:21.364-0500 I NETWORK [LogicalSessionCacheRefresh] Starting new replica set monitor for meteor/localhost:17142
2024-04-04T15:46:21.364-0500 W NETWORK [LogicalSessionCacheRefresh] Failed to connect to 127.0.0.1:17142, in(checking socket for error after poll), reason: Connection refused
2024-04-04T15:46:21.364-0500 I CONTROL [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: config.system.sessions does not exist
2024-04-04T15:46:21.373-0500 W NETWORK [LogicalSessionCacheRefresh] Unable to reach primary for set meteor
2024-04-04T15:46:21.373-0500 I NETWORK [LogicalSessionCacheRefresh] Cannot reach any nodes for set meteor. Please check network connectivity and the status of the set. This has happened for 1 checks in a row.
Each CryoSPARC job on our cluster is terminated via Slurm, so there are no existing Mongo/CryoSPARC processes running on the node. When I restart CryoSPARC from the preexisting directory, this is the grep
output I see:
[npavlovikj@2420~]$ ps -eo user,pid,ppid,start,cmd | grep -e mongo -e cryosparc_
npavlov+ 2927723 1 15:46:18 python /opt/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /opt/cryosparc/cryosparc_master/supervisord.conf
npavlov+ 2931613 2927094 15:50:01 bash /opt/cryosparc/cryosparc_worker/bin/cryosparcw connect --worker 2420 --master 2420 --port 18883 --ssdpath /tmp --gpus 0 --rams 1 --cpus 1
npavlov+ 2932443 2928900 15:51:01 grep --color=auto -e mongo -e cryosparc_
In my current setup, the node is the same, just the port number changes.
After I am able to reuse the directory with the same node, I would like to be able to do that with different nodes as well.
Please let me know if you need any additional information.
Thank you,
Natasha