Trying to separate out two datasets that have been merged

gmperez · January 8, 2024, 9:14pm

Hi! I have an odd problem – I recently collected two datasets (using EPU) of the same sample with the same optics (same dose, calibrated the scope between each dataset, same pixel size, etc.). In trying to get some stuff going (and since I am a bit of a novice), I carried out import, MotionCorr, and CTF estimation with all micrographs at once, instead of doing each dataset alone.

I recently was told that it might be good to independently process each dataset before merging – will I have to re-import, MC, and CTF estimate the, or is t datasets individually, or is there another way that I can easily split the mics in cryoSPARC?

One idea I had for this came from, putzing around in the CTF estimation folder and it looks like the output from it has 3 files from every micrograph (a *.mrc file and two *.npy files). Can I possibly separate outputs from the CTF estimation using the UIDs that EPU puts on their movies, and then re-import the CTF estimated mics? I guess my only confusion with this is in what I should import, because I don’t really understand the purpose of the *.npy files

Sorry if the question is wordy and confusing. I appreciate any advice I can get.

olibclarke · January 8, 2024, 9:56pm

Were the datasets collected consecutively? That is, do they have different ranges of dataset indices? If so you could prob separate them out based on the index range in Curate Exposures

leetleyang · January 8, 2024, 10:22pm

In addition to Oli’s suggestion, you could perform a second import of movie_set_1 followed by an intersect operation in Exposure Sets Tool, comparing your existing CTF output (set A) against movie_set_1 (set B) while ignoring UID. With any luck, A_intersect_B and A_minus_B output groups should correspond to your two halves.

Cheers,
Yang

gmperez · January 9, 2024, 1:21pm

Trying these out today, will let y’all know if one or the other works!

gmperez · January 10, 2024, 2:59pm

Hi all,

Just wanted to make a note here of what I tried. When i tried to use the index range of curate exposures, I was not able to tell them apart as it seemed from my end that I couldn’t get the UID from EPU to show up so I couldn’t tell if they were processed one dataset first, and then the second, or if there is a mix. What I mean here is that my import was carried out using a movies directory where I had already put in both sets of files, so I guess I am concerned that I don’t know if the files were strictly processed as dataset1 and then dataset2. I haven’t tried Yang’s suggestion due to wanting to run other jobs on some of the volumes I have been able to make.

However, someone from my lab had a good recommendation: I am going to move onto Relion after getting some good looking reconstructions, so it should be easy to split the particles using the .star file there.

Thank you all for your advice!

olibclarke · January 10, 2024, 4:37pm

Are the filenames different - is there a grid identifier in the filename? If so it should be easy to tell in Curate Exposures if they have been imported consecutively.

gmperez · January 17, 2024, 3:49pm

Update here: I recently started using a new cryosparc install that had v4, and I found on the curate exposures that you could get filenames. When I was trying to use the v3.2.0 I didn’t see this, but maybe I missed it. That would have helped out a lot and makes me wish I had used v4 earlier!

gmperez · January 17, 2024, 3:50pm

I also will say that I tried Yang’s suggestion and it worked! Thank you everyone for your advice