I have a big set of micrographs (eg. 15000) and millions of particles. I want to test some parameters eg. 2D classifications. Right now, I can create randomized particle sets. The problem is that I don’t have any say in where the particles are coming from. So even if I have sets of 100 000, subsequent jobs always load all the micrographs (I assume that the particles are distributed across all of them).
I would like to have a feature where I can limit the number of micrographs taken into account for selecting the particles. Eg. randomly selecting 1000 micrographs that would be used for particle sets, so that the number of micrographs loading for the subsequent jobs is smaller.
this is more of a workaround. It is another job that I need to submit to the queue and takes some time to finish, while the particle sets job could be selecting particles only across a subset of randomized micrographs.
It should be possible to do this without re-writing particle images (via the Restack job), if you don’t desire to do so.
One way to accomplish this is to build a Manually Curate Exposures job and connect, as inputs:
all particles
only the subset of exposures you want to pull particles from
Then the job will only output the particles that come from the input exposures (it’s not necessary to set any thresholds – you may simply launch the job and click “Done”).
both the Restack Job and your way work. However, the restack job took 1.5h to finish with 300k particles and 4 CPUs, and the other way requires at least two different jobs (exposure sets, curate exposures) to do so. It feels like limiting the number of micrographs that the Particle sets job would take into account would do the trick nicely and in no time.