Delete rejected exposures

Hi,
Is there a way to delete just rejected exposures, in either cryoSPARC, or cryoSPARC Live? If not, would the team considering adding this feature?

Thanks

2 Likes

Hi @ccgauvin94, the only way do this right now is with a custom Python script. In CryoSPARC v4.1, you can use the cryosparc-tools Python library on a machine with access to the exposures.

Here’s an example script:

from pathlib import Path
from cryosparc.tools import CryoSPARC

cs = CryoSPARC(license="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx", email="nick@example.com", password="password123", host="localhost", base_port=39000)
project = cs.find_project('P#')
job = project.find_job('J#')
exposures_rejected = job.load_output('exposures_rejected')
project_dir = Path(project.dir())

for group in ('movie_blob', 'micrograph_blob', 'micrograph_blob_non_dw', 'background_blob', 'micrograph_thumbnail_blob_1x', 'micrograph_thumbnail_blob_2x'):
    field = group + "/path"
    if field in exposures_rejected:
        print(f"Removing {len(exposures_rejected)} blobs in field {field}")
        for rel_path in exposures_rejected[field]:
            print(f"Removing {rel_path}...")
            abs_path = project_dir / rel_path
            abs_path.readlink().unlink()

Substitute your license, user account credentials, instance hostname and port in the cs = CryoSPARC(... initialization.

Substitute P# and J# with the project and job numbers respectively. J# should be a Curate Exposures job. For Live, use a Live Exposure Export job (create from Session > Details > Actions > Export exposures) and substitute exposures_rejected with rejected_exposures or manual_rejected_exposures.

Hope that helps, let me know if you have any trouble with it.

5 Likes

Great, thanks. I saw the cryosparc-tools announcement and thought that might provide this functionality. I’ll give this a go with a test dataset.

1 Like

Hi @nfrasser

I am playing around with this script and I’m not sure what the following line is doing exactly:

            abs_path = project_dir / rel_path

It seems to be taking my relative path, and somehow producing the absolute path of the micrograph with it. But I’m not familiar with the operand, nor the type. Could you point me toward some documentation of this?

Thanks

@ccgauvin94 this is a Python pathlib navigation operation for joining two directories, it’s equivalent to the following:

import os.path
abs_path = os.path.join(project_dir, rel_path)

Hope that helps!

1 Like

I came up with a Jupyter notebook based on the above and cryosparc-tools to delete movies based on the rejected exposures from the Curate Exposures job. Just putting it here in case anyone else finds it useful:

If you stop before the last block, it will just print out a list of the filenames, which you can then use to move the files to a different folder or whatever, instead of deleting.

3 Likes

Dear nfrasser:
Do we have an easier way to delete the rejected images by the “Curate exposures” job in the new version?
Thanks,
Lan

Are you referring to

Are you referring to files that were written by CryoSPARC job, or to files that were imported to CryoSPARC and subsequently used for further processing, like motion correction, etc.?

Yes, I would like to know if we have an updated method to delete the rejected images from the row source file, motion-collected images, which are identified by the curation exposures step of a normal process. Thanks,
Lan

I am also interested if anything is happenning in this direction?

It would also be very useful when we do EMPIAR depositions because then we can deposit only the ‘good’ micrographs and conserve some disk space.

CryoSPARC Tools has relatively good support for this now. Here is a Python script I made that you can run in an environment with CryoSPARC Tools installed to see and delete rejected exposures:

#!/usr/bin/env python3

from cryosparc.tools import CryoSPARC
from pathlib import Path
import argparse

def options():
    parser = argparse.ArgumentParser()
    parser.add_argument("--cryosparc_license", type=str, help="CryoSPARC License")
    parser.add_argument("--cryosparc_host", type=str, help="CryoSPARC Hostname/URL")
    parser.add_argument("--cryosparc_port", type=str, help="CryoSPARC Base Port")
    parser.add_argument("--cryosparc_email", type=str, help="CryoSPARC Account Email")
    parser.add_argument("--cryosparc_password", type=str, help="CryoSPARC Account Password")
    parser.add_argument("--cryosparc_project", type=str, help="CryoSPARC Project")
    parser.add_argument("--cryosparc_job", type=str, help="CryoSPARC Job containing rejected exposures output. Will need --cryosparc_live flag in addition to this if the job is from a Live session exposure export.")
    parser.add_argument("--cryosparc_live", action='store_true', help="This argument indicates that job with rejected exposures output is a CryoSPARC Live Export Exposures job type.")
    parser.add_argument("--delete", action='store_true', help="Delete the rejected exposures instead of just printing a list.")
    return parser.parse_args()

def cryosparc_initialize(cryosparc_license, cryosparc_host, cryosparc_port, cryosparc_email, cryosparc_password, cryosparc_project):
    cryosparc_instance = CryoSPARC(host=str(cryosparc_host), license=str(cryosparc_license), email=str(cryosparc_email), password=str(cryosparc_password), base_port=int(cryosparc_port))
    assert cryosparc_instance.test_connection()
    project = cryosparc_instance.find_project(cryosparc_project)
    project_dir = Path(cryosparc_instance.find_project(cryosparc_project).dir())
    return project, project_dir

def cryosparc_exposure_cleanup(cryosparc_project, cryosparc_project_path, cryosparc_job, cryosparc_live, delete):
    job = cryosparc_project.find_job(cryosparc_job)
    if cryosparc_live == True:
        exposures_rejected = job.load_output("rejected_exposures")
    else:
        exposures_rejected = job.load_output("exposures_rejected")
    
    for exposure in exposures_rejected.rows():
        for group in (
         "path",
         "movie_blob",
         "micrograph_blob",
         "micrograph_blob_non_dw",
         "background_blob",
         "micrograph_thumbnail_blob_1x",
         "micrograph_thumbnail_blob_2x",
         ):
            field = group + "/path"
            if field in exposure:
                rel_path = exposure[field]
                abs_path = cryosparc_project_path / rel_path
                print(abs_path.resolve())
                if delete == True:
                    try:
                        abs_path.resolve().unlink()
                        abs_path.resolve().remove()
                    except:
                        continue

def main():
    arguments = options()
    cryosparc_project, cryosparc_project_path = cryosparc_initialize(arguments.cryosparc_license, arguments.cryosparc_host, arguments.cryosparc_port, arguments.cryosparc_email, arguments.cryosparc_password, arguments.cryosparc_project)
    cryosparc_exposure_cleanup(cryosparc_project, cryosparc_project_path, arguments.cryosparc_job, arguments.cryosparc_live, arguments.delete)
if __name__ == "__main__":
    main()
3 Likes

Thank you, a script is a good way to deal with this. But for people that have never used CryoSPARC tools and have no idea how and where to run this script it is not that straightforward. And how to edit it to just work in the project one wants to clean up. I know that the instructions for cryosparc-tools are available but my question was also for the CS team if we can expect something integrated in the UI.

A GUI option in the UI would be easier for the users, I think. I imagine that the cryoSPARC team would like to avoid deleting raw data from hard disks but with ample warnings and safety steps it should be quite safe.

1 Like