Delete excluded micrographs

Hi!

As far as I know, there is no easy way to track the name of each micrograph that ended up in the different groups after manual curation of exposures. I find that it could be quite handy to have a list of file names of the excluded micrographs so that one can delete useless raw data micrograph/exposure files. Wouldn’t that be nice?
At the moment, what workarounds do you use for such purpose?

Thanks,
André

Hi André,

You should be able to convert the cs file for the excluded mics to a star file using csparc2star.py, then make a list from that that you can use to delete the junk. I haven’t actually tried that, but I don’t see why it wouldn’t work.

Cheers
Oli

Hi @AndreGraca,

You can do this by first downloading the movie_blob .cs file output from the exposures_rejected output result group of the Manually Curate Exposures job.
Screen Shot 2020-09-15 at 3.58.21 PM
You can also just get the path to the .cs file on the master node by navigating to the Outputs tab and pressing the “copy path” button on the movie_blob output:

Once you have the .cs file, you can then open it up using the instructions from our guide on manipulating .cs files here:

https://guide.cryosparc.com/processing-data/manipulating-.cs-files-created-by-cryosparc

You can then find the filenames, and either delete the files manually, or use python to do this for you.

For example, open a shell on the master node and run cryosparcm icli to start an interactive python session, then run the following:

from cryosparc2_compute import dataset
exposure_dset = dataset.Dataset() #initialize the dataset object
dataset_path = "<path_to_cs_file_here>"
exposure_dset.from_file(dataset_path) #load the .cs file
exposure_dset.data['movie_blob/path'] #will print out a sample of all the values in this field

# the following will write all the filenames to a text file that 
# can be piped to a unix delete command
with open("exposures_to_delete.txt", 'w') as openfile:
    for file_to_delete in exposure_dset.data['movie_blob/path']:
        openfile.write(file_to_delete + '\n')

#the following will delete the files sequentially
import os
for file_to_delete in exposure_dset.data['movie_blob/path']:
    try:
        os.remove(file_to_delete)
    except:
        print("Unable to delete {}".format(file_to_delete))
        continue
2 Likes

I had to do this recently and found some changes in the cryosparc_compute package (eg some differences in the properties of the Dataset object). Here’s an updated example for future searchers.

from cryosparc_compute import dataset
dpath = "<path_to_cs_file_here>"
dset = dataset.Dataset.load(dpath)
for file_to_delete in dset['micrograph_blob/path']:
    try:
        os.remove(file_to_delete)
    except:
        print("unable to delete {}".format(file_to_delete))
        continue

(Note, do this in the CryoSPARC Project directory where the relevant jobs will be found)

3 Likes

Hi,

Just wondering. Have there been any newly added CryoSPARC function that does the removal? Are these still up-to-date?

Thanks.

Regards,
qitsweauca

Last I checked the best way to do this was with cryosparc-tools.

1 Like

Hi Andre,
Not sure if you have solved this problem. The data management in CryoSparc looks nice but still not comfortable for using.
I got a python code here (GitHub - zhangjuen/CryoEM). One command line makes it work!
Hope this helps.
J

1 Like

Thanks for your script! It may help me and others :slight_smile:

Please see this cryosparc-tools example
https://tools.cryosparc.com/examples/delete-rejected-exposures.html
and heed the warning regarding potential data loss associated with improper use.

2 Likes