Extracted particles different in number to imported particles

Hi,

I have a set of 4044 Patch CTF Estimated micrographs, and I imported a set of 81,763 particles from RELION 4.

The import job correctly lists 4044 micrographs and 81,763 particles. However, after running an Extract from Micrographs, I see that it reports 3886 micrographs and 77,404 particles.

Could anyone advise as to what’s causing this?

The final lines of the job output, which is where the mismatch seems to stem from, are as follows:

[CPU: 1.62 GB] ==== Completed. Extracted 77404 particles in 1337.61s.
[CPU: 1.44 GB] --------------------------------------------------------------
[CPU: 1.44 GB] Compiling job outputs…
[CPU: 1.44 GB] Passing through outputs for output group micrographs from input group micrographs
[CPU: 1.44 GB] This job outputted results [‘micrograph_blob’]
[CPU: 1.44 GB] Loaded output dset with 3866 items
[CPU: 1.44 GB] Passthrough results [‘background_blob’, ‘mscope_params’, ‘ctf’, ‘ctf_stats’, ‘micrograph_blob_non_dw’, ‘micrograph_thumbnail_blob_1x’, ‘micrograph_thumbnail_blob_2x’, ‘rigid_motion’, ‘spline_motion’, ‘movie_blob’, ‘gain_ref_blob’]
[CPU: 1.44 GB] Loaded passthrough dset with 4044 items
[CPU: 1.44 GB] Intersection of output and passthrough has 3866 items
[CPU: 1.44 GB] Passing through outputs for output group particles from input group particles
[CPU: 1.44 GB] This job outputted results [‘blob’]
[CPU: 1.44 GB] Loaded output dset with 77404 items
[CPU: 1.44 GB] Passthrough results [‘location’, ‘ctf’]
[CPU: 1.45 GB] Loaded passthrough dset with 81763 items
[CPU: 1.45 GB] Intersection of output and passthrough has 77404 items
[CPU: 1.44 GB] Checking outputs for output group micrographs
[CPU: 1.44 GB] Checking outputs for output group particles
[CPU: 1.44 GB] Updating job size…
[CPU: 1.44 GB] Exporting job and creating csg files…
[CPU: 1.44 GB] ***************************************************************
[CPU: 1.44 GB] Job complete. Total time 1342.04s

Thanks,

Taha

did you increase box size? it’s possible that csparc doesn’t include particles as close to the border of the images. although ~95% micrographs remain and ~95% particles remain so it’s more likely that some micrographs are dropped entirely. can you check the csparc location for the job and count the mrc files it is referencing? are all there?

Hi @CryoEM1. Thank you for your observations and suggestions.

I didn’t increase the box size, although the box size in either case is quite big (384). So it could be that cryoSPARC is dropping particle images with micrograph edges in them.

However, I do also take the point that the fraction of retained micrographs and particles is suspiciously similar.

The extract folder only has the 3886 .mrc files; whereas the CTF estimation and motion correction folders have the full 4044.

So I just finished re-extracting with a smaller box size (320), and this gives me more particles and micrographs albeit still not the full amount. This would support the hypothesis that it has to do with micrograph edges being contained in the particle image cutouts.

It made me realise that I don’t recall ever seeing micrograph edges in cryoSPARC extracted particles, unlike RELION ones.

Thanks once again @CryoEM1.

@tahashahid Particles near edges are indeed rejected by cryoSPARC extraction. A future code patch will ensure an appropriate notification in the log.

1 Like

Hi @tahashahid,

In Patch 220315, we’ve added better logging to the Extract From Micrographs (GPU) job to show the number of particles being extracted, as well the number of particles that were rejected because they were too close to the edges of the micrograph.

2 Likes

Thanks @wtempel and @stephan!

I have a related issue. I have two sets of particles being processed in parallel for SSD space reasons. However, when I try and combine them, I end up with fewer total particles. One data set has 137K and the other 135K and combined it is only 186K. Here are links to the two blob files.

Please can you describe upstream workflow that lead the two sets of particles.
Were the raw data preprocessed, particles picked and extracted together, then partitioned with the Particle Sets Tool?
Were the J123 and J228 particles picked from distinct, non-overlapping sets of micrographs, respectively?
Please also describe how you combined the particle sets.
Do the 137k, 137k, 186k counts all correspond to extracted particles?
Have you already examined the contents of the cs files (see method 1 or method 2) for the smaller sets (J123 or J228, presumably) and checked which ones are “missing” from the combined set (cs file not posted?).