My archived projects are huge

Hi folks,

I’m trying to download all my cryoSPARC projects ahead of a full wipe of the drive they’re currently on. Checking the directory sizes, the smaller projects are over 4 TB, and the larger ones are around 14 TB - even after archiving a project in the cryoSPARC GUI. This seems crazy huge to me. Ideally, I’d like to preserve the projects such that I can pick them up later and, best-case scenario, continue work on some of them, or worst-case, be able to see a detailed record of what I did in the past by accessing metadata and/or images produced by the jobs themselves. I also have considered using tar to compress the project files, but was warned that this could further damage the drive and was advised against it.

What am I missing that I should be doing to make this process feasible?

Thank you all so much,
Kate

Hi Kate,

I think we are from the same institution. I will explain what I did in the past few weeks.
You should use the cryosparc cleaning tool and clean all motioncorr / extract jobs. You may save around 70% of the total size. This cleaning process will not delete the job, so you can re run it afterwards. You should also clean the intermediate results.
Archiving does not means compression, so the archiving project will have the same size.

Here’s the documentation : Guide: Data Cleanup (v4.3+) | CryoSPARC Guide

If you want to preserve the projects, ideally you need to detach them (using “detach project” option) from the cryosparc instance you are working on, archiving them with tar command (only to make easier the transfer to the cold storage drive or so) and after that, when cryosparc will be reusable again, you should attach the projects using attach project option. Don’t forget to back up raw data as well. And keep in mind that this is a time-consuming process.

I personally did that and I didn’t see any damage after un-tar one project to another cryosparc instance.

here’s some documentation for that : Guide: Data Management in CryoSPARC (v4.0+) | CryoSPARC Guide

Good luck!

Thank you so much Kevin!! You’re a legend. This is amazing. Thankfully, the raw data is all backed up - I just couldn’t figure out the best way to manage my projects. This is great and I’ll take care of it immediately.

Happy to help!
Feel free to contact me at kmartin@caltech.edu if you have further questions

Please allow me to re-emphasize the following points related to the Guide: Data Cleanup (v4.3+) | CryoSPARC Guide link:

  1. The Clear preprocessing jobs option does delete outputs of certain jobs. After clearing, applicable jobs need to be re-run to make their outputs available for further analysis, as @KevinM mentioned.
  2. When considering the Clear non-final jobs option, be sure to understand the difference between jobs that ran to completion and jobs to which Mark Job as Final has been applied.
  3. A hint for future processing: CryoSPARC v4.4 includes the option to save certain results using the float16 data type.
1 Like