As far as I know, there is no easy way to track the name of each micrograph that ended up in the different groups after manual curation of exposures. I find that it could be quite handy to have a list of file names of the excluded micrographs so that one can delete useless raw data micrograph/exposure files. Wouldn’t that be nice?
At the moment, what workarounds do you use for such purpose?
You should be able to convert the cs file for the excluded mics to a star file using csparc2star.py, then make a list from that that you can use to delete the junk. I haven’t actually tried that, but I don’t see why it wouldn’t work.
You can do this by first downloading the movie_blob .cs file output from the exposures_rejected output result group of the Manually Curate Exposures job.
You can also just get the path to the .cs file on the master node by navigating to the Outputs tab and pressing the “copy path” button on the movie_blob output:
You can then find the filenames, and either delete the files manually, or use python to do this for you.
For example, open a shell on the master node and run cryosparcm icli to start an interactive python session, then run the following:
from cryosparc2_compute import dataset
exposure_dset = dataset.Dataset() #initialize the dataset object
dataset_path = "<path_to_cs_file_here>"
exposure_dset.from_file(dataset_path) #load the .cs file
exposure_dset.data['movie_blob/path'] #will print out a sample of all the values in this field
# the following will write all the filenames to a text file that
# can be piped to a unix delete command
with open("exposures_to_delete.txt", 'w') as openfile:
for file_to_delete in exposure_dset.data['movie_blob/path']:
openfile.write(file_to_delete + '\n')
#the following will delete the files sequentially
import os
for file_to_delete in exposure_dset.data['movie_blob/path']:
try:
os.remove(file_to_delete)
except:
print("Unable to delete {}".format(file_to_delete))
continue
I had to do this recently and found some changes in the cryosparc_compute package (eg some differences in the properties of the Dataset object). Here’s an updated example for future searchers.
from cryosparc_compute import dataset
dpath = "<path_to_cs_file_here>"
dset = dataset.Dataset.load(dpath)
for file_to_delete in dset['micrograph_blob/path']:
try:
os.remove(file_to_delete)
except:
print("unable to delete {}".format(file_to_delete))
continue
(Note, do this in the CryoSPARC Project directory where the relevant jobs will be found)
Hi Andre,
Not sure if you have solved this problem. The data management in CryoSparc looks nice but still not comfortable for using.
I got a python code here (GitHub - zhangjuen/CryoEM). One command line makes it work!
Hope this helps.
J