Hi,
I was wondering if it is possible to simply add new micrographs to an existing project and only process the new files using the previous steps as template, similar to the relion-type “pipeline approach”.
I.e. if import a whole micrograph folder during data collection, the content will certainly increase with time.
For “on-the-fly” processing it would be nice not having to re-estimate all defoci, re-pick all-particles, re-classify and so on…
At the moment I am simply cloning the jobs from day 1 for the next day. Obviously, all the images that were processed before will be processed again, as cryosparc doesn’t recognize them as “completed”.
Any suggestions?
Import the new group of micrographs, replace the input on the cloned job with these new micrographs for CTF, templates, etc. When you finally use the particles add the particles from all the micrograph groups as inputs to the refinement job. After that, you can use the particles from the refinement/ab init/2D job as the input for later steps so you won’t have to drag-and-drop multiple things anymore.
@DanielAsarnow The point is: how can I add only the new mics?
If I select the session folder the old ones will be imported again.The idea is to import directly during data collection, to have an immediate impression and processing of the data. In the end, batches of selected particles should be combined for final calculations.
Oh, you can just use a wildcard instead of importing the whole directory. Click the first micrograph, then replace foo_0001_bar.mrc with foo_01*_bar.mrc, to get the first 100 (or 101 if you started at 0) micrographs. I dunno if you can shift-click to select ranges in the file browser pop up, but it seems like a good feature to add.
that’s a way, however it requires the filenames to be consecutive or am I wrong?
Wouldn’t it be easy to add a flag for “already processed” in the database?
What would be really nice is to define a scheme, e.g. in the tree view and just feed this template scheme (including pre-defined settings for ctf estimation, template picking, ncc, box size, extraction etc.) with new input images.
How would you add all micrographs created after 20180912_2306_12169104_MC2_DW.mrc using wildcards?
Let’s assume there are 500 images before that were already processed and 500 after.
With bash it’s easy but from within chrome I have no idea how to add all at once.
The file browser is part of cryoSPARC, it’s not the native one from your operating system. I assume it processes the string using glob.glob from the Python standard library (or something similar) - my quick check appears to confirm.
Thus, for example, micrographs/stack_0[4-5]*.mrc will choose all the micrographs from 400 to 599. Indeed, arbitrary numeric ranges aren’t supported, but I don’t think it really matters if a few micrographs are ever reprocessed, like in your example where you had previously imported all of them at first and not fixed ranges from the beginning. If you want to avoid that in the future you could always use a predictable grouping like 0[0-4]*, 0[5-9]*, 1[0-4]*, etc. while only running these imports after that many micrographs have been recorded.
I imagine eventually they’ll add a true incremental on-the-fly feature, by my guess as its own job type, but until then this is probably the only way.
Thank you Daniel.
That awnsered my question totally. With your suggestions grouping can indeed be done with just a few movements.
However, I appreciate the --do_unfinished or equivalents used by the MRC born tools. I would love to have something similar here.