Does cryosparc remove duplicate particles?

hi all

I have an aligned map at pretty good resolution. There is symmetry that is not related by point group. I used signal subtraction to remove each symmetric region, which at times is very close and even overlapping (depending on the view). This means I did 2 signal subtractions, effectively to double the particles. However when I load both of the above stacks into a new refinement I only end up with half the particles (loading both stacks gives the same # of particles as just one stack). Not sure if this is a bug or rather the particles are so close together that cryosparc is removing duplicates? not sure what’s happening. Thanks.

I’m fairly certain that it does remove duplicate particles, but it would be very helpful for an expert to weigh in and describe exactly when/where/how it does so.

I’ve used this to advantage by running multiple classifications on a particle set then combining the good classes from each run into a single refinement to maximize the number of good particles. The # of particles output by the refinement is always less than the input #, so I assume it’s deleting duplicates. On the other hand, I do not know for sure, nor do I know on what basis it determines duplicates. I suspect (but don’t know) that the particle sets must all come from a single extraction job for this to work; otherwise, you may run into problems of having true duplicates (and incorrect FSC estimation) if they come from different extraction jobs. Again, advice from the experts would be appreciated.

Just occurs to me that I might have stumbled on a solution to your problem, which is to make sure your two sets of particles come from two different extraction jobs. Or maybe you need to back up further and make them come from two different micrograph imports. I’d worry about the believability of the FSC if I were trying this. YMMV.

Here’s how particles are handled in cryoSPARC internally:

When you extract or import a particle stack, cryoSPARC assigns a Unique Identifier (UID) number to each particle. As you process particle stacks through various jobs in cryoSPARC, the UID is remains constant for each particle, even as you find alignments for it or perform signal subtraction.

When you combine two “different” particle groups into a job, the job takes the intersection of the particle groups based on their UID: i.e., it only keeps particles that have the same UID across both groups and uses the resulting particles for processing. Therefore the behaviour @orangeboomerang is seeing is intended.

In the future we may provide an option to allow reassignment of particle UIDs. In the meantime, you can try the following workaround to manually reassign all the UIDs:

  1. Export one of the Particle Subtraction jobs from the job sidebar:
  2. Identify the project directory, Project ID and Job ID for the Particle Subtraction Job. For example, if you are on project 3 with subtraction jobs 42 and your project directory is /home/nick/cryosparc2_projects/P3, the project ID is P3 and the job ID is J42
  3. In a command line, navigate to the cryoSPARC installation directory, then into the cryosparc2_master directory.
  4. Run ./bin/cryosparcm icli to enter cryoSPARC’s interactive CLI mode
  5. Enter the following commands, substituting the PROJECT_DIRECTORY, PROJECT_ID and JOB_ID declarations according to your setup
    # ==== MODIFY THESE DECLARATIONS ACCORDINGLY ====
    PROJECT_DIRECTORY = '/home/nick/cryosparc2_projects/P3'
    PROJECT_ID = 'P3'
    JOB_ID = 'J42'
    # ===============================================
    full_job_id = '{}_{}'.format(PROJECT_ID, JOB_ID)
    particles_location = '{}/exports/jobs/{}_particle_subtract/{}_particles/{}_particles_exported.cs'.format(PROJECT_DIRECTORY, full_job_id, full_job_id, full_job_id)
    from cryosparc2_compute import dataset
    particles = dataset.Dataset()
    particles.from_file(particles_location)
    particles.reassign_uids()
    particles.to_file(particles_location)
    
  6. Press control + D to exit and enter y to confirm
  7. Re-import the job. For the above example the job import path is /home/nick/cryosparc2_projects/P3/exports/jobs/P3_J42_particle_subtract
  8. Reconnect the refinement inputs for the newly imported job instead of the previous exported job (J42) and retry the refinement

Let me know if you have any trouble with that.

Nick

1 Like

Hi @nfrasser , on this topic: does it mean that if you take two particle select jobs which contain an overlapping set of particles, and you use those as input for a 2D classification job, the 2D classification uses only a unique set of particles, or in other words it gets rid of the duplicates, right ? Many thanks for your answer !

@marino-j this only works if the particles from the two selection jobs ultimately came from the same initial picking job.

For example, you run a single “Template Picker” job (template picker never generate picks that overlap at the same location). You send that output to two “Inspect Particle Picks” jobs, which you apply with different filters. When finished the two jobs have some overlap. You send both outputs to 2D classification. However, 2D Classification filters out duplicates based on their unique ID because the particles were generated by the same Template Picker job.

If, instead, you create two different Template Picker jobs with the same exposures, extract and send both outputs to 2D Classification, 2D classification does NOT filter out particles at the same location because they were generated by independent Template Picker jobs and assigned different random unique IDs.

Hope that makes sense, let me know if there’s anything else I can clarify.

2 Likes

@nfrasser Your explanation does help. Thanks for taking the time to clarify.

RJ

@nfrasser thank you for the clarification, indeed that was the case :slight_smile:

Hi all,

Just to provide an update: In the just released v3.0, we have exposed a standalone Remove Duplicate Particles job located under the Utilities section. This can be used to filter out any particles that may have been picked too closely together, or can be used in more advanced workflows such as safely combining particle picks from multiple different pickers (e.g. combining both template picks and blob picks from two different jobs, for example). A set of particles can be input to this job, and the outputted particles will have duplicates removed and can be used for classification or refinement. A few more details about this job are provided in the guide job page.

Best,
Michael