Particle sign and Scale for Simulate Data

Hi, all
In simulate data of v4.3.1, I notice that there is an update named Particle Sign, and the default value is -1. However, it seems in the legacy simulate data, this parameter can’t be defined by users.
Generally, I use simulate data to generate particles to facilitate quick 2D classification for good particles and recentering. However, when I use simulate data in v4.3.1 to generate simulated particles, the generated particles and my real particles always split into different classes. I’m wondering what the parameters should be if I want to merge the simulate data and the real data?
Put up here is the 2D classification result.


This problem is really in an urgency. Look forward to any suggestion! Thx!

I changed the particle sign and scale, also with different SNR (either lower or higher), but the results turned to be the same.

Can you explain a little more about this workflow? This is not a common workflow, and I would be a little concerned about the possibility of template bias.

1 Like

Sure. I use volume with large binning (usually bin8) to generate simulate particles as seeds for 2D classification. Usually the ratio between simulated particles and real particles ranges between 1:10 to 1:2 (depends on cases). This worked quite well for initial isolation of some good particles from a lousy dataset.

1 Like

Ok, so I guess Nyquist of ~15 Å for the simulated particles? That’s reasonable - if you were using with less binning, I would be concerned about the possibility of an Einstein from noise situation, but with sufficient binning you should be fine

Looking here, it looks like there ought to be a “Legacy Simulate Data” job that would allow you to replicate the original behavior, but I can’t find it at least in my v4.4 systems

I’d still be concerned about it, largely due to how CryoSPARC does 2D classification by default. The tiny subsets, combined with quick default convergence, and a single full pass by default is fast, and great for really clean, nicely picked data, but usually works poorly on more difficult data.

If I’m understanding the logs correctly, CryoSPARC aligns all the way to the maximum resolution you set (default 6, for dirty data I often back off to 8 or 10, for “cleanup” runs of already selected particles, I’ll occasionally drop it to 4) right from the first iteration, so the risk of overfitting with simulated data is of great concern.

Can OP give more details regarding parameters used for 2D classification?

1 Like

Agreed - one would need to be very aggressive about binning in this situation to rule out template bias (to the point where your 2Ds might not have so many features)

2 Likes

Yes. The simulated particles only used at the very beginning stage to discard junks. Once the junk particles are ruled out, bin4 or bin2 particles are classified without simulated data.
I can’t find legacy simulated data in my v4.3 either. But according to the tests done with different scale and particle sign combination, together with different SNR (as it was reported the calculation for SNR is changed in the current version), particles failed to converge.

Hi @pywt901,

Just chiming in: to access legacy jobs, you have to click the “funnel” icon in the job builder, and then hit the toggle labelled “Show legacy jobs”. You should then be able to see or search for the legacy simulator job in the builder search box.

Best,
Michael

2 Likes

Ah! Did not know this, good to know, thanks!

This can be very helpful! Thanks a lot!

Hi, Michael!
Another thing to mention about simulated date, can I appeal for updates simulated particles for filaments?:joy:

Hi @pywt901 ,

It is possible to run the simulate data job on filament datasets, with a few caveats:

  • The pose distribution by default will be uniform, rather than “equatorial” as most filaments are in reality
  • The filament slot won’t be generated, since it requires information that is obtained from particle picking on the micrograph level (i.e. spacing between segments on the filament, in-plane angles of the filament, and filament ID’s, etc.)

It’s possible to somewhat workaround the first obstacle by supplying an input particle dataset to the simulate job, with alignments3D coming from a previous refinement of a real filament dataset. This is because the simulate data job can pull particle poses from an input set of particles.

Let me know if this is helpful!
Michael

Hi, Michael!
I didn’t actually find place where I can connect alignment3D information in this job (v4.3.1)

Hi @pywt901,

This feature is only available in the version of Simulate Data in CryoSPARC v4.4+, and the current job displayed in the attached image has been marked as legacy in this version.

Best,
Michael