Exposure Set Split Question

Does the Exposure Set utility split exposures sequentially by date and time collected if randomize is NOT turned on? For example, if I have 5,000 exposures, then split into 1 with 3554 exposures with remainder 1,446, will it give the first 3554 exposures by date and time it was collected or just choose exposures from all over the 5,000 exposure set?

In the absence of randomization, the order in which the exposures were imported to CryoSPARC is followed. This order may or may not match the order of data collection, depending on circumstances.

@wtempel to be clear, does it match the glob listing order followed by the chronological order of appearance (in a live session)?

@klmcguire if you want to separate your data by time I highly recommend you 1) have SerialEM put the true collection timestamp into the file names and 2) set up your exposure groups outside cryosparc using a particles file and then import them.

Dealing with timestamps is tricky so you really want to convert them to a real time representation and then sort them. Also it’s easy for filesystem time stamps to get messed up if you aren’t careful with how you copy them (thus time stamps from data collection being preferable).

I want to get this right: I interpret “followed by” as “first sorted in glob listing order, then sorted in order of appearance”, but that probably is not what you implied, is it?

1 Like

That’s what I meant! Seems like what would happen in a live session where there some are existing files and then new ones continue appearing. (And then offline would be glob listing only - no new files).

I ckecked with the team and now understand CryoSPARC Live to work like this:

  1. an empty list of imported exposures is initiated
  2. periodically, while running:
    a. a batch of exposures that have newly appeared (but that have not yet been imported) is identified.
    b. exposures from that batch are appended to the list of imported exposures in ascending alphabetical order.

In the end, imported exposures would be a concatenation of exposure batches, alphabetically ordered only within each batch (unless alphabetical ordering is specially “encouraged”, for example by including a time stamp in the filename, as you mentioned earlier).
In the case of “off-line” importation (after data collection and transfer are complete), there would only be a single batch and alphabetical order would apply throughout.
In the case of an “on-line” Live session, checks may occur as frequently as every few seconds, and batches may consequently only contain a single exposure.

Thanks for the details!

Because glob ordering puts e.g. 10 before 2, and most time stamp formats won’t be in order either, the best approach is to set exposure groups using a custom script and import or make e.g. symbolic links with custom alpha-ordered file names.

However - another possibility would be to use the first part of a filename timestamp with the Exposure Groups utility, and then manually sort them after refinement if you need to plot the parameters. E.g. if you have Mon-22-00-00 or something like that you can select the Mon part to go by day or the Mon-22 part to go by hour. But it’s not flexible enough to split other time periods.