Gridfsdata_0 Restoration after MongoDB Corruption

Hello,

Recently we had dual failures on our primary and backup drives storing our instance’s database. While we were able to recover the DB using mongodb --repair, all of the file links (PNG/PDF/etc) from the last 6 months are broken and no longer showing or downloading through the interface. We noticed that the images are still in the gridfs folders within the job folders, but must have been corrupted in the DB. Is there a way to mass reload this data back into the database?

I can provide logs if needed. I wasn’t sure which to include up front. Please let me know.

Thanks,

Charlie

[Edited 2022-10-25 to specifically point to an older version of the script, which was written for CryoSPARC v3]
Hi @bowman,

This indicates something could be seriously wrong with the database, well beyond broken file links. Your best bet may be to re-install CryoSPARC with a new database directory and import each project directory into this new instance. In any case, I wrote a script you can run to do a limited restore from gridfs files, but I cannot guarantee that it will fully restore the database to a functional state. Download it here.

Script usage:

cryosparcm call python restore_gridfs_files.py --project <puid> [ --job <juid> ] [ --dump]

You’ll have to run it separately for each project whose image data you’d like to restore.

The --dump flag resets the grids files in the job directory after files are restored without errors. I recommend trying it out on one job without the --dump flag, e.g.,

cryosparcm call python restore_gridfs_files.py --project PX --job JY

Where, PX is the project UID (e.g., P42) and JY is the job UID.

If that restores the job’s images, try it on the whole project with the --dump flag

cryosparcm call python restore_gridfs_files.py --project PX --dump

If you run into errors at any point, please send me the output. If the script doesn’t work on the first job, I’d recommend the reinstallation strategy.

Thank you. I will attempt this and report back here. So far everything is still working except for this, but I will look out for further issues.

Hi @nfrasser , we are testing this command and ran into the following error:

cryosparcm call python /tmp/restore_gridfs_files.py --project P198 --job  J836
Found 1 to restore
** WARNING: do not attempt re-run this command until you run the following command for each job:

    cryosparcm cli 'dump_job_database("P198", "<juid>")'

To avoid this warning, provide the --dump flag

Continuing in 5 seconds...
  ERROR: Could not restore J836: there are no users authenticated, full error: {'operationTime': Timestamp(1666386530, 21), 'ok': 0.0, 'errmsg': 'there are no users authenticated', 'code': 13, 'codeName': 'Unauthorized',

Is there a way to authenticate through the CLI so that this can run?

My colleague has posted an update to the script that should work with CryoSPARC version 4.

1 Like

Excellent, that got us running!

We are running into another error though. Based on the message, I’m thinking it is unrecoverable and being caused by corrupted or missing data that got us here in the first place.

Is there a way to run this command on the project so that it skips over broken J numbers and continues? I’m assuming there will be some unrecoverable data as there is a drive loss, but running through job-by-job when there is a broken J could be time consuming.

$ cryosparcm call python ./restore_gridfs_files.py --project P249 --dump
Found 206 to restore
** WARNING: --dump is enabled, gridfs files in the job directory will be reset after successful restore.
Continuing in 5 seconds...
Importing gridfs files for J191...
  No files uploaded for J191
Importing gridfs files for J168...
  No files uploaded for J168
Importing gridfs files for J96...

... progresses as normal through many J up to 180 ...

Importing gridfs files for J180...
  Uploading files for J180 from /gpfs/group/em/cryosparc/rdepaiva/P249/J180/gridfs_data/gridfsdata_0 ...
    FILE 62648f4cff95dca12232b8fb J180 P249_J180_1_of_1596_gpfsgroupemcryosparcrdepaivap249j42imported013193768214028661205_21sep14g_00001sq_v01_00002hl1250_00003ed_a_dwmrc.png
    FILE 62648f4cff95dca12232b8fe J180 image.png
    FILE 62648f4cff95dca12232b900 J180 image.png
    FILE 62648f4dff95dca12232b902 J180 P249_J180_1_of_1596_extracted_particles.png
    FILE 62648f4dff95dca12232b904 J180 P249_J180_1_of_1596_extracted_particles.pdf
    FILE 62648f4dff95dca12232b907 J180 image.png
    FILE 62648f4eff95dca12232b909 J180 image.png
    FILE 62648f74ff95dca12232b93f J180 P249_J180_lowpass_filtered_images.png
    FILE 62648f74ff95dca12232b941 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62648f9cff95dca12232b979 J180 P249_J180_lowpass_filtered_images.png
    FILE 62648f9cff95dca12232b97b J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62648fc3ff95dca12232b9b3 J180 P249_J180_lowpass_filtered_images.png
    FILE 62648fc3ff95dca12232b9b5 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62648feaff95dca12232b9ed J180 P249_J180_lowpass_filtered_images.png
    FILE 62648feaff95dca12232b9ef J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649011ff95dca12232ba27 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649011ff95dca12232ba29 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649037ff95dca12232ba61 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649037ff95dca12232ba63 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649060ff95dca12232ba9b J180 P249_J180_lowpass_filtered_images.png
    FILE 62649060ff95dca12232ba9d J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649094ff95dca12232bad5 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649094ff95dca12232bad7 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626490caff95dca12232bb0f J180 P249_J180_lowpass_filtered_images.png
    FILE 626490caff95dca12232bb11 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626490feff95dca12232bb49 J180 P249_J180_lowpass_filtered_images.png
    FILE 626490feff95dca12232bb4b J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649135ff95dca12232bb83 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649135ff95dca12232bb85 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649169ff95dca12232bbbd J180 P249_J180_lowpass_filtered_images.png
    FILE 62649169ff95dca12232bbbf J180 P249_J180_lowpass_filtered_images.pdf
    FILE 6264919eff95dca12232bbf7 J180 P249_J180_lowpass_filtered_images.png
    FILE 6264919eff95dca12232bbf9 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626491deff95dca12232bc31 J180 P249_J180_lowpass_filtered_images.png
    FILE 626491deff95dca12232bc33 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649211ff95dca12232bc6b J180 P249_J180_lowpass_filtered_images.png
    FILE 62649211ff95dca12232bc6d J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649241ff95dca12232bca5 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649241ff95dca12232bca7 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649272ff95dca12232bcdf J180 P249_J180_lowpass_filtered_images.png
    FILE 62649272ff95dca12232bce1 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 6264929dff95dca12232bd19 J180 P249_J180_lowpass_filtered_images.png
    FILE 6264929dff95dca12232bd1b J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626492c8ff95dca12232bd53 J180 P249_J180_lowpass_filtered_images.png
    FILE 626492c8ff95dca12232bd55 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626492feff95dca12232bd8d J180 P249_J180_lowpass_filtered_images.png
    FILE 626492feff95dca12232bd8f J180 P249_J180_lowpass_filtered_images.pdf
    FILE 6264933bff95dca12232bdc7 J180 P249_J180_lowpass_filtered_images.png
    FILE 6264933bff95dca12232bdc9 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649374ff95dca12232be01 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649374ff95dca12232be03 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626493a9ff95dca12232be3b J180 P249_J180_lowpass_filtered_images.png
    FILE 626493a9ff95dca12232be3d J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626493d4ff95dca12232be75 J180 P249_J180_lowpass_filtered_images.png
    FILE 626493d4ff95dca12232be77 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 6264940fff95dca12232beaf J180 P249_J180_lowpass_filtered_images.png
    FILE 6264940fff95dca12232beb1 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649450ff95dca12232bee9 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649450ff95dca12232beeb J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649486ff95dca12232bf23 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649486ff95dca12232bf25 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626494bdff95dca12232bf5d J180 P249_J180_lowpass_filtered_images.png
    FILE 626494bdff95dca12232bf5f J180 P249_J180_lowpass_filtered_images.pdf
    FILE 626494f3ff95dca12232bf97 J180 P249_J180_lowpass_filtered_images.png
    FILE 626494f3ff95dca12232bf99 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649529ff95dca12232bfd1 J180 P249_J180_lowpass_filtered_images.png
    FILE 62649529ff95dca12232bfd3 J180 P249_J180_lowpass_filtered_images.pdf
    FILE 62649564ff95dca12232c00b J180 P249_J180_lowpass_filtered_images.png
    FILE 62649564ff95dca12232c00d J180 P249_J180_lowpass_filtered_images.pdf
 Uploaded 69 files for J180 in 0.56s
  Updating database events ...
  Updating tile images...
  Updating output group images...
  Updated 69 event files for J180 in 0.11s
  Attempting to clean up old files...
  Deleted 69 old files
  Dumping job...
    WARNING: gridfs files in the job directory will be reset
  ERROR: Could not restore J180: Encountered error for method "dump_job_database" with params {'project_uid': 'P249', 'job_uid': 'J180'}:
ServerError: no chunk #0
Traceback (most recent call last):
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 755, in next
    chunk = self._next_with_retry()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 747, in _next_with_retry
    return self._cursor.next()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1246, in next
    raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_command/commandcommon.py", line 194, in wrapper
    res = func(*args, **kwargs)
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_command/commandcommon.py", line 260, in wrapper
    return func(*args, **kwargs)
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 3549, in dump_job_database
    rc.dump_job_database(project_uid = project_uid, job_uid = job_uid, job_completed = job_completed, migration = migration, abs_export_dir = abs_export_dir, logger = logger)
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_compute/jobs/runcommon.py", line 356, in dump_job_database
    file_object = gridfs.get(objectid.ObjectId(object_id)).read()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 565, in read
    chunk_data = self.readchunk()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 528, in readchunk
    chunk = self.__chunk_iter.next()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 759, in next
    raise CorruptGridFile("no chunk #%d" % self._next_chunk)
gridfs.errors.CorruptGridFile: no chunk #0

Traceback (most recent call last):
  File "./restore_gridfs_files.py", line 226, in <module>
    restore_job_dir(args.project, job_dir, args.dump)
  File "./restore_gridfs_files.py", line 183, in restore_job_dir
    cli.dump_job_database(project_uid=puid, job_uid=juid)
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_compute/client.py", line 66, in func
    + self._format_server_error(res['error'])
AssertionError: Encountered error for method "dump_job_database" with params {'project_uid': 'P249', 'job_uid': 'J180'}:
ServerError: no chunk #0
Traceback (most recent call last):
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 755, in next
    chunk = self._next_with_retry()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 747, in _next_with_retry
    return self._cursor.next()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/pymongo/cursor.py", line 1246, in next
    raise StopIteration
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_command/commandcommon.py", line 194, in wrapper
    res = func(*args, **kwargs)
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_command/commandcommon.py", line 260, in wrapper
    return func(*args, **kwargs)
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 3549, in dump_job_database
    rc.dump_job_database(project_uid = project_uid, job_uid = job_uid, job_completed = job_completed, migration = migration, abs_export_dir = abs_export_dir, logger = logger)
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/cryosparc_compute/jobs/runcommon.py", line 356, in dump_job_database
    file_object = gridfs.get(objectid.ObjectId(object_id)).read()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 565, in read
    chunk_data = self.readchunk()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 528, in readchunk
    chunk = self.__chunk_iter.next()
  File "/home_local/hpc/software/cryosparc/cryosparc2_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/site-packages/gridfs/grid_file.py", line 759, in next
    raise CorruptGridFile("no chunk #%d" % self._next_chunk)
gridfs.errors.CorruptGridFile: no chunk #0

These messages suggest that the following:

  1. Figure/image data are uploaded from the job directory to the database seemingly without error
  2. The subsequent “dump”, which is supposed to overwrite information in the job directory with newly restored database records fails, apparently because the restoration of database records has silently failed

This failure might imply:

  1. The database has been corrupted in a way such that the database can not be reliably restored with the image upload procedure.
  2. The underlying cause of database corruption has not been conclusively identified and therefore may not have been corrected.
  3. A failed dump may destroy previously intact information in the job directory because the seemingly restored database records that are supposed to replace the information in the job directory are in fact corrupt.

I therefore suggest starting with a fresh database, attaching existing project directories as needed.

  1. cryosparcm shutdown
  2. Ensure all processes related to this CryoSPARC instance have been terminated:
    ps axuww | grep -e cryosparc -e mongo
  3. Edit the line of /path/to/cryosparc_master/config.sh that starts with
    export CRYOSPARC_DB_PATH= to point to a directory on reliable storage
  4. Delete cs.lock files from project directories that you wish to attach later. Manual deletion of cs.lock files is generally discouraged, but is warranted under this particular recovery scenario.
  5. cryosparcm start
  6. Attach individual project directories