Order of particles for half sets

Hi,

I would like to know how does cryoSPARC splits the dataset into halves to be refined independently. I would expect a simple odd-even split based on my input .star file when creating the dataset, but it doesn’t seem to be the case. I realized this after I exported a cryoSPARC refinement to RELION with Daniel Asarnow’s script. I am working on a particular application where the order of the particles is important.

Thanks!

1 Like

The split is given in the cryoSPARC metadata files. I neglected this in previous versions of csparc2star.py, but I pushed an update to my github. The latest version will populate rlnRandomSubset according to the split in the cryoSPARC input files.

1 Like

Thanks a lot, Daniel!

It seems however that neither cryoSPARC nor RELION are splitting the dataset the way I thought they would.

So, if I understand correctly:

  • RELION - if the _rlnRandomSubset key is not present, it will be generated with random assignments (not odd-even).

  • cryoSPARC - the N particles will be put in random order, then the first N/2 go to one half-set and the remainder go to the other half-set;

In a standard single-particle setting, both of these approaches, and the odd-even as well, are equivalent of course.

In my case, my particles are extracted from 2D crystals, so I want all my particles that come from the same 2D crystal to go to the same half-set otherwise I get spurious correlations because they have a big overlap. In a way it is similar to particles extracted from helical filaments. For FREALIGN this is easy, all I do is to generate a .par file and .mrcs stack with the particles from each micrograph interleaved and taking care of ensuring balanced half-sets. From that, I generate a .star file that I can feed into RELION. But I see now that if I want this particular order to be preserved in RELION, I also need to specify the _rlnRandomSubset key from the very beginning.

But it’s not clear to me if there’s any way to preserve this order in cryoSPARC. Is there?

1 Like

I believe that cryoSPARC will always create a new order. Perhaps @apunjani would consider importing rlnRandomSubset assignments.

One thing you can do now, is simply refine the two halves completely separately. Then you can align output maps to compute FSC and estimate resolution. Finally average these maps or replace the files in a Refine3D so you can do a maximum likelihood reconstruction with all the particles. Edit to clarify this could be done in either cryoSPARC or Relion, except for that cryoSPARC doesn’t marginalize poses so you would only do the align/FSC approach.

3 Likes

@apunjani I wonder if cryosparc now honors the split in the input star file? Perhaps a checkbox in the Import job can be used to provide this option to the users. Thanks.

I had assumed that cryosparc was importing the rlnRandomSubset values! If not, it definitely should I think @apunjani - workflows including both relion and cryosparc are common, and I would definitely like to know that my half sets are kept separate throughout…

It does work the other way - csparc2star.py exports the half set assignments, which Relion then respects.

1 Like

Hi @DanielAsarnow, @olibclarke,

Currently as of v2.3.0, cryoSPARC “import particles” job does not read the half-set split from .star files.
In the next release we will have the import particles job read this field, and provide it as an output in the form of particles.alignments3D, and refinement jobs will have an option to use this rather than re-splitting the data, which is the current behaviour. Note that local refinement jobs do not re-split the data, they always use the split from the incoming particles.alignments3D (which probably came from a refinement or nu-refine).

Ali

@apunjani Great! The splits should continue to be the same between classes, though. (It makes sense in context why it goes under alignemtsn3D but I have always thought it was a little odd to have the separate fields per class with the same values).

Is this available in current version (v2.5)? Not obvious which option in the Refinement jobs is for this choice.

Did this end up changing @apunjani? Does cryosparc respect imported rlnRandomSubset values in the current version?

Cheers
Oli