Hi,
I am certain this is not a good thing to do, but nevertheless might be the only feasible way of currently executing the task we have at hands.
As a data curator/facility manager, sometime one faces the situation where the storage server is full of old CS projects and they need to be cleaned up and archived. Frequently it happens that many of the project users are no longer around to do so.
To take action over this and since some of the projects are not connected to an instance any longer, I thought the most practical would be to remove heavy files, such as motion corrected micrographs using the rm command pointing at those large files from specific directories: e.g., ‘rm /path/to/cs-projectname/J*/motioncorrected/*’ to remove any motion corrected micrographs. This could be applied to all project we want to archive if we place them all in a shared directory, which is very practical.
I can already have an idea of possible difficulties and the need to provide instructions for project recovery for users that could come up the idea to dig into this old data. For instance, upon attaching the project, the motion correction job would still be marked as complete but any jobs depending on motion corrected micrographs would not be able to find those files. A simple workaround I think should be feasible is to clear the motion correction job and rerun it prior to launching any new jobs.
Most likely there will be also other problems as project recovered from our archive would probably end up in a completely different path in the file system than its original location… here I imagine the problems would be bigger.
I want to hear and discuss here the implications and follow-up complications of removing data in this way if any of these projects is ever to be recovered from our storage archive/tape. I am also interested to hear if there is already anyone implementing cryosparcm cli or cryosparc tools for such workflows.
Please do not hesitate in sharing some of your experience!
Thanks,
André