Only a part of particle stack was loaded in 3D ab initio reconstruction

I would appreciate your help to figure out why not entire particle stack was loaded in 3D Ab initio reconstruction. I have two particle stacks processed under the same conditions. Stack1 contained 426k particles and Stack2 contained 321K, totaling 747k. When combined in 3D reconstruction, only 580k was loaded and the rest of 167K was not. I found a prior thread back in Feb 20 reporting a similar problem but the solution in that thread did not help my case. Any advice would be highly appreciated. Thanks.

@haomingz. How many classes did you specify for the 3D ab initio reconstruction job?

Hi, Thank you for your response. I specified one class for the 3D ab initio reconstruction and planned to do 3D classification afterwards. Is there a limit in number of particles per 3D class?

In the single-class case specifically, ab initio reconstruction may indeed not output all particles.

For subsequent 3D classification, may I suggest you first run homogeneous 3D refinement with the following inputs:

  • 3D volume from ab initio 3D reconstruction
  • particles that were previously input to that same ab initio 3D reconstruction

This way, you should be able to classify your complete particle stack.

As you suggested, I run a homogenous 3D refinement with inputs of the 3D volume from ab initio reconstruction (from 580K ptcls) and 747K ptcls in two particle stacks. I ended with 580K ptcls, no change. Here are the outputs from homogenous refinement. I also run “Check for corrupt particles” but didn’t find anything unusual, no warning of any kind.

[CPU: 5.43 GB] Full run took 995.595s

[CPU: 2.77 GB] --------------------------------------------------------------

[CPU: 2.77 GB] Compiling job outputs…

[CPU: 2.77 GB] Passing through outputs for output group particles from input group particles

[CPU: 2.80 GB] This job outputted results [‘alignments3D’, ‘ctf’]

[CPU: 2.80 GB] Loaded output dset with 584874 items

[CPU: 2.80 GB] Passthrough results [‘blob’, ‘alignments2D’, ‘location’, ‘pick_stats’, ‘ml_properties’]

[CPU: 3.01 GB] Loaded passthrough dset with 584874 items

[CPU: 3.01 GB] Intersection of output and passthrough has 584874 items

[CPU: 2.77 GB] Checking outputs for output group particles

[CPU: 2.77 GB] Updating job size…

[CPU: 2.77 GB] Exporting job and creating csg files…

[CPU: 2.77 GB] ***************************************************************
[CPU: 2.77 GB] Job complete. Total time 1026.62s

Please can you share the file job.json from the refinement job’s directory and the complete processing log (text from Overview tab) with us?
If you have applied the 220824 patch you can retrieve the processing log with the
cryosparcm eventlog PX JY > PX_JY_events.log
command (guide).

Are you referring to

the file containing the event log has been sent. Thanks.

Yes. To avoid different parameters used for different ptcl stacks in 2D classification, I loaded all particles in one job and re-do 2D classification using the same parameters. Still the job returned 580K with 160K missing.

The respective job.json files from the jobs immediately upstream from homogeneous 3D refinement (that supplied the particles) may include additional clues about the missing particles.

is it possible the two particle stacks have the same particle twice? for clarification, are 426 and 321 from independent picking jobs? same micrographs? or are they different selections of particles that came from one massive picking/extracting job before? If the exact same particle (unique ID in the cs file) is part of both stacks it will be used only once in all job types. perhaps perform intersection (particle sets tool) to make sure there are not “shared” particles between them

The datasets were collected on the same microscope on different days. Particles were picked independently. After cleaned up in 2D classifications, all particles were combined. Each should be unique. However, I am not sure if some micrographs may share the same names. Thanks for your suggestion to perform intersection.

Analysis of the metadata for the jobs upstream of ab initio 3D reconstruction suggests that these upstream jobs indeed output 426k and 321k particles, respectively, but also that there are nearly 163k simultaneously present in the outputs of both upstream jobs.
I will attach information about the particles that are deemed duplicates to an email.