Number of particles to choose

hsosa · November 28, 2024, 5:02pm

Several extraction jobs (e.g. downsamplig) have an option to select the number of particles to be extracted. When this number is set to less than the possible total how is the selected group chosen?
randomly and equal number between the two independent datasets used to calculate the GSFSC?

I have a massive number of particles (several millions after symmetry expansion) and would like to use this option to reduce the computation load.

Thanks

Mark-A-Nakasone · November 30, 2024, 6:17am

Use 16-floating points for extraction
Do you need the full box size ? can you “Fourier crop” 512=>128 or 512=>64
You can extract everything, just use particle sets tool to randomly divide/split them (e.g. 12M can be split to groups of 0.5M).

Screenshot 2024-11-30 at 06.14.29300×610 26.7 KB

kstachowski · December 2, 2024, 6:29pm

Hi @hsosa,

The selected group is chosen by the order of particles in the particles.cs file (ie particles 1 → X). Particles in the file are assigned their order the first time when they are picked/extracted, and this is normally in micrograph index order (this is generally the order in which they are read into CS during import).

In terms of GSFSC splits, picking/extraction jobs are not concerned with maintaining proper spilts. Anytime a job type where a split is required for GSFSC calculations, you will see a message in the top of the log detailing the split. For example:

====== Gold Standard Split ======
  Particles have input alignments3D connected, so reusing pre-existing split
  Set A is greater than set B by 59 particles (0.00604 percent difference relative to the total dataset).
  Split A has 488808 particles 
  Split B has 488749 particles

In the event that the split differs by more than 2%, you will see the following message in the log along with a warning appearing on the job card:

====== Gold Standard Split ======

  Particles have input alignments3D connected, so reusing pre-existing split

  Set A is greater than set B by 49998 particles (100 percent difference relative to the total dataset).
  This is a difference of greater than 2%.
  If equally-sized Gold Standard splits are desired, please use the 
  'Balance half-sets' mode in Particle Sets Tools.
  Alternatively, 'Force re-do GS split' may be enabled, but this might not preserve Gold Standard 
  independence.

  Split A has 49999 particles 

  Split B has 1 particles

In this event, I would recommend using particle sets tools to rebalance.

Best,
Kye