Particles are ignored as an input without notice when combining stacks

Dears,

we encounter an issue or bug where we can not combine different particle stacks into one. For example after different picking and 2D classification approaches, we want to combine the particle stacks in a remove duplicates job from 2x 2D classifications and 2x Ab initio Jobs (no local refinement prior, all particles are extracted at 424, cropped to 212 at the same pixel size).

Particle Numbers:

  • Job278-Ab initio 32.383 particles
  • Job326-2D classification 56.943 particles
  • Job436-A initio 182.480 particles
  • Job447-2D classification 1.967 particles

That should make 273.773 particles.

However, the remove duplicates job reports:

[CPU: 289.3 MB] Loaded particle stack with 213808 items

This number of particles is also reported when I reextract from micrographs, do an ab initio job or a 2D classification with the 4 stacks together.

There is also no message in the log-file about the missing particles. Having the micrographs as an input or not does not make a difference. Removing all partiles inputs except location does not make a difference neither.

I would appreciate help to find the reson for losing the 59.965 particles.

Best,

Max

Hi @mruetter ,

When you connect multiple particle datasets to a single input, CryoSPARC will automatically perform a union operation on the set based on particle UIDs, which are assigned to each particle during picking (i.e., the datasets will be merged together, and all duplicates removed).

In this case, the four upstream jobs likely have 59 965 particles which have a duplicate UID, which is why they are removed prior to the Remove Duplicates job even seeing them. Do the upstream jobs share picks?

The Remove Duplicate job is designed to remove particles which are nearby, but not exactly the same pick (that is already handled automatically by this union operation for any downstream jobs).

That said, your post has flagged for us that this ‘union’ operation is relatively opaque to the user, so we’ve created a task to make this more obvious in a future release.

Let me know if all this makes sense.

Valentin

Hi Valentin,

got it! The upstream jobs indeed share picks. But the particles were different in their relative alignments in each upstream job. We wanted to remove duplicate particles based on their alignment error. However, this solved our concern that it just “ignores” some particles, thank you!

Best,

Max

Gotcha! Unfortunately, there is no way to do this in CryoSPARC directly, but you can use tools for this. Note however that in our hands per-particle alignment error from different refinements is rarely a good indicator of particle quality.

Thank you for pointing this out. We have a particle with almost perfect symmetry but one cofactor kind of breaks it. So we used a good stack for seeding durcing 2D classification. It worked quite well.

1 Like