Database spawn error ver 3.3.2

Sorry this seems a recurring event. We tried repairing the database but failed. Unfortunately we don’t have a MongodDB backup and would really appreciate your help to restore running CS. The cryosparcm log database and mongod repair messages are very long; I can send them by email for trouble shooting. The repair log seems pointing to an error when we prematurely removed a job before the next job was completed like indicated below.

2023-06-16T14:12:15.932-0400 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.salvage 3200
2023-06-16T14:12:15.950-0400 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.salvage 3300
2023-06-16T14:12:16.120-0400 E STORAGE  [initandlisten] WiredTiger error (5) [1686939136:120935][32270:0x7fd2a46ccd40], file:collection-81--4522566990421786315.wt, WT_SESSION.salvage: .//collection-81--4522566990421786315.wt: handle-read: pread: failed to read 1593344 bytes at offset 532168704: Input/output error
2023-06-16T14:12:16.204-0400 E STORAGE  [initandlisten] WiredTiger error (5) [1686939136:204395][32270:0x7fd2a46ccd40], file:collection-81--4522566990421786315.wt, WT_SESSION.salvage: .//collection-81--4522566990421786315.wt: handle-read: pread: failed to read 4096 bytes at offset 532168704: Input/output error
2023-06-16T14:12:16.212-0400 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 195
2023-06-16T14:12:16.212-0400 I -        [initandlisten]

***aborting after invariant() failure


2023-06-16T14:12:16.226-0400 F -        [initandlisten] Got signal: 6 (Aborted).

 0x5571d1954ac1 0x5571d1953cd9 0x5571d19541bd 0x7fd2a380b630 0x7fd2a3464387 0x7fd2a3465a78 0x5571d0c28bd8 0x5571d0e077d0 0x5571d0e0e1ef 0x5571d0e11e54 0x5571d130bfee 0x5571d0c13195 0x5571d0c15e70 0x5571d0c3407b 0x7fd2a3450555 0x5571d0c8ec41
----- BEGIN BACKTRACE -----

After identifying and eliminating the cause (hardware error? unsuitable filesystem choice?) of the Input/output error, you may want to try the recovery procedure outlined in Could not get database status - #13 by wtempel.

Thanks for your instructions. They are very helpful. But it was not clear to me how to attach the project in CS 3.3.2. The instructions you sent works in CS 4.0+. Do I need to update to CS4.0+ in order to attach the projects created in CS 3.3.2? Many thanks.

I apologize for my mistake. You are correct: projects can be imported, not attached, in v 3.3.2.
The downtime imposed by the database error may be a good opportunity to update and patch your CryoSPARC instance. Before you go ahead, please ensure you understand what caused the database error, and how you can avoid such error in the future.

The import function in CS 3.3.2 seems to import the exported jobs only. But we did not export any jobs before CS crashed. Is there anyway to get the projects and jobs back without exported jobs?

We had got the CS 3.3.2 running with a new mongod database. We can update our current CS 3.3.2 to the latest version. Is it possible to attach the project created in CS 3.3.2 to CS 4.0+ then?
Many thank!

The project was imported but contained blank workplaces; no jobs were actually imported. Any advice what went wrong and how to trouble shoot? Thanks.

This is not the expected/documented behavior. Please can you inspect
/path/to/cryosparc_master/run/command_core.log for events and failures during the import. The log can be accessed and browsed with the command
cryosparcm log command_core

This should be possible, but I need to confirm…

My statement “The import function in CS 3.3.2 seems to import the exported jobs only” was incorrect. CS3.3.2 did import projects. The progress bar showed import was completed but the workplaces showed no jobs. I’ve sent you the command_core.log file by email. The log file showed several errors.

This email has not (yet?) found its way into my inbox. Please can you post relevant/representative errors in this forum topic (after concealing confidential information as needed).

A post was split to a new topic: Failed to get GPU info

The command_core.log for the import showed one warning (see below). For the most part it worked ok.

2023-06-21 12:08:24,525 COMMAND.DATA         import_project_run   INFO     | Done. Inserted 0 streamlogs in 0.01s...
2023-06-21 12:08:24,525 COMMAND.DATA         import_project_run   INFO     | Imported J140 into P6 in 0.06s...
2023-06-21 12:08:24,543 COMMAND.DATA         import_project_run   WARNING  | Unable to locate exported job document for P6 J141
2023-06-21 12:08:24,543 COMMAND.DATA         import_project_run   INFO     | Uploading project image data...
2023-06-21 12:08:24,852 COMMAND.DATA         import_project_run   INFO     | Done. Uploaded 109 files in 0.31s

.......
2023-06-21 12:10:49,455 COMMAND.DATA         import_project_run   INFO     | Imported J98 into P6 in 0.01s...
2023-06-21 12:10:49,456 COMMAND.DATA         import_project_run   INFO     | Uploading project image data...
2023-06-21 12:10:49,481 COMMAND.DATA         import_project_run   INFO     | Done. Uploaded 0 files in 0.03s
2023-06-21 12:10:49,513 COMMAND.DATA         import_project_run   INFO     | Inserted job document in 0.06s...
2023-06-21 12:10:49,513 COMMAND.DATA         import_project_run   INFO     | Inserting streamlogs into jobs...
2023-06-21 12:10:49,518 COMMAND.DATA         import_project_run   INFO     | Done. Inserted 0 streamlogs in 0.00s...
2023-06-21 12:10:49,518 COMMAND.DATA         import_project_run   INFO     | Imported J99 into P6 in 0.06s...
2023-06-21 12:10:50,943 COMMAND.DATA         import_project_run   WARNING  | Imported project from /DATA01/cryosparc_hz/P46 as P6 in 163.29s with errors.

Thanks @haomingz for posting the log entries.
Please can you post the output of this command:

grep -e ERROR -e WARNING /app/apps/rhel7/cryosparc/cryosparc2_master/run/command_core.log | grep import_project_run

Please let us know if the output is too long for posting.

cryosparc@lnx00013 run]$ grep -e ERROR -e WARNING command_core.log | grep import_project_run
2023-06-20 15:12:06,554 COMMAND.MAIN         run                  ERROR    | POST-RESPONSE-THREAD ERROR at import_project_run
2023-06-20 15:12:06,554 COMMAND.MAIN         run                  ERROR    |   File "/app/apps/rhel7/cryosparc/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 3406, in import_project_run
2023-06-20 15:24:55,956 COMMAND.DATA         import_project_run   WARNING  | Unable to locate exported workspaces document. Importing as an empty project.
2023-06-21 06:58:11,823 COMMAND.MAIN         run                  ERROR    | POST-RESPONSE-THREAD ERROR at import_project_run
2023-06-21 06:58:11,823 COMMAND.MAIN         run                  ERROR    |   File "/app/apps/rhel7/cryosparc/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 3416, in import_project_run
2023-06-21 12:08:24,543 COMMAND.DATA         import_project_run   WARNING  | Unable to locate exported job document for P6 J141
2023-06-21 12:10:50,943 COMMAND.DATA         import_project_run   WARNING  | Imported project from /DATA01/cryosparc_hz/P46 as P6 in 163.29s with errors.

This one is pretty short actually. Thanks.

You may want to explore the context of the more interesting warnings/errors
for hints about the project directories involved, like:

grep -A 10 -B 10 "Unable to locate exported workspaces document." /app/apps/rhel7/cryosparc/cryosparc2_master/run/command_core.log

and

grep -A 10 -B 10 "ERROR    |   File" /app/apps/rhel7/cryosparc/cryosparc2_master/run/command_core.log

Neither of the grep commands returned anything.

Have logs recently rotated? Do you get any output with these commands:

grep -A 10 -B 10 "Unable to locate exported workspaces document." /app/apps/rhel7/cryosparc/cryosparc2_master/run/command_core.log.1
grep -A 10 -B 10 "ERROR    |   File" /app/apps/rhel7/cryosparc/cryosparc2_master/run/command_core.log.1

command_core.log is the most recent one. Command_core.log.1 was created five day ago when CS was not set up properly. I don’t think the log.1 file would help with troubleshooting unless you think otherwise.

Maybe the grep commands I provided are not quite accurate. Alternatively you could manually browse command_core.log to investigate the lines that surround the lines output by your earlier grep search Database spawn error ver 3.3.2 - #15 by haomingz. Look for specific file or directory paths, project or job identifiers and other information that could help in a more precise problem definition.

I combed through the log file but didn’t find anything meaningful. Thanks to your help, I recovered the most of what I need. We’ll keep daily backup after updating CS to the latest version. Your help is very much appreciated!!