Workspaces.json incomplete

DMR · December 19, 2023, 6:29pm

Hello,
on a single workstation installation (v4.4.1) we have the problem that the workspaces.json file is incomplete and doesn’t seem to be updated anymore (last modification date is December 1st) . The database is fine and we can use Cryosparc without problems.
We found the problem when we tried to copy and re-attach a project and some Workspaces had no jobs. We inspected the original workspaces.json and these workspaces had no job entries. Re-starting Cryosparc (on the original installation) and running new jobs didn’t help - the workspaces.json wasn’t updated. I tried the solution from here: Repopulating workspaces.json - #6 by boggild but got the following error: “pymongo.errors.OperationFailure: there are no users authenticated, full error: {‘operationTime’: Timestamp(1703010257, 90), ‘ok’: 0.0, ‘errmsg’: ‘there are no users authenticated’, ‘code’: 13, ‘codeName’: ‘Unauthorized’, ‘$clusterTime’: {‘clusterTime’: Timestamp(1703010257, 90), ‘signature’: {‘hash’: b’\xd0\xca_n\xcc\x87\x85?;\xf4xV\xaa~A\xc9t\xf3\x95U’, ‘keyId’: 7280834173701455876}}}”.
Any help would be appreciated.
I have to mention that this is a huge project with 33 workspaces and some 20 thousand jobs which we wanted to move and clean when we discovered the problem and the size might have contributed to the problem.
Thank you in advance,
Dirk

DMR · December 22, 2023, 12:09pm

The main issue and cause of the problem is that the workspaces.json file is not being updated. I am now seeing the same behavior in unrelated projects on a second instance. Can anybody clarify in which interval workspaces.json should be updated in active projects? I see problems for all future data migrations if this file isn’t updated. Time-wise it might be linked to the update to version 4.4.1, but I am not sure. Is there still a way to create this file manually from the database?

wtempel · December 22, 2023, 3:00pm

Hi @DMR.

This is not unexpected. Associations with specific workspaces are stored in the job documents, which are exported as job.json to the job directories.

That solution may not apply to the current problem and/or version of CryoSPARC. To get to the bottom of the current problem, please

describe the history of the project (directory) before you copied it:
i. what was the latest version of CryoSPARC that interacted with the original directory from which the copy was made?
ii. was/is the original of the copy still connected to a CryoSPARC instance?
show the command(s) with which you copied the project
review the command_core log(s) for errors related to the project attachment.
email us a compressed copy of the cryosparc_master/run/command_core.log file. If that file does not include information for the time when the project was attached, please identify which of the older command_core.log.[0-9] files contains that information, compress it and send it to us also. I will let you know the e-mail address in a private message.

DMR · January 4, 2024, 10:00am

i. what was the latest version of CryoSPARC that interacted with the original directory from which the copy was made?

4.4.1

ii. was/is the original of the copy still connected to a CryoSPARC instance?

Yes both are connected to the same instance, the user wants to clean the project (it has 100TB), but is afraid to accidentally delete/lose important jobs/data. The copy was made as a backup, the cs.lock was removed and the (copied) project was re-attached to ensure it would be a viable backup.

show the command(s) with which you copied the project

rsync -av P21/ P21-backup/ (executed as the cryosparc user, also executed a second time to make sure all files had been copied)

review the command_core log(s) for errors related to the project attachment.

I don’t see anything obvious, the workspace with the most missing jobs is W33 in P33 (original P21) - the log just states it has been created.

email us a compressed copy of the cryosparc_master/run/command_core.log file. If that file does not include information for the time when the project was attached, please identify which of the older command_core.log.[0-9] files contains that information, compress it and send it to us also. I will let you know the e-mail address in a private message.

The log files have been sent via email

wtempel · January 8, 2024, 8:12pm

Thanks for sending over the files. command_core.log.4 indicates a problem during the import of P33-J1391. That problem is expected to cause the import of additional jobs during the attachment of project P33 to also fail. Some data inside the J1391 job directory are likely corrupted.
To determine whether the source of the copy is corrupted or the corruption is limited to the second copy, you may try

exporting J1391 from the original project directory
then linking or copying the exported job directory (found under the exports/jobs/ subdirectory of the original project directory) to a destination project directory
importing the exported job into the destination project

Does this sequence also fail?

DMR · January 9, 2024, 2:27pm

Hi,
that sequence works fine and J1391 was imported successfully. What would be the best way to import all other missing jobs? Or is it best to repeat the import of the project? There are still several hundred jobs missing. I compared all files in the original and copy of J1391 and the sizes match perfectly so no apparent signs of corruption in the copy.

wtempel · January 9, 2024, 4:19pm

detach the project whose import failed, including the Delete project from database action.
confirm in some way(?) that the project copy’s J1391 job directory is in fact corrupted and establish a root cause for the corruption, such as a disrupted rsync operation or faulty storage media at the rsync destination.
make another copy of the source project directory, making adjustments based on the identified cause of the previous copy’s failure, such as confirming uninterrupted completion of rsync and/or a different storage device, as appropriate
attempt attachment of the new project copy

DMR · January 10, 2024, 10:06am

I can’t find anything wrong with the copy of J1391. All files are present and have the same size in original and copy. I also used the Linux ‘diff’ command to compare the files in J1391 in the original and copy and they are identical. It is a Topaz job and I did not run diff on all mrc files in the preprocessed and gridfs_data subfolders, but the total size of these folders matches as well.

Could the issue be somewhere outside of the J1391 directory?
We did run a couple of jobs in the original project since we made the copy so comparing files outside of specific jobs is a bit more difficult as they might have been changed

wtempel · January 24, 2024, 10:39pm

@DMR Given the J1391-related UnpicklingError during the P33 attachment attempt and given the successful separate import of J1391 from the original copy of the P33 project directory, corruption of a file or files inside the copied J1391/gridfs_data/ directory is very likely. Comparing checksums between copies of files may be more reliable than comparing their sizes, but would not be meaningful if one copy has been legitimately modified in the meantime.

Given these choices and the large number of jobs still missing, I would recommend exporting and importing individual jobs on an as-needed basis.