Reverting from symmetry expanded downsampled set to full scale?

olibclarke · September 29, 2023, 11:43am

If I perform 3D-VA on a symmetry-expanded version of a downsampled particle set, and then use 3D-VA display to select a subset of particles from a particular mode, is there any way to obtain the equivalent selection of a (separately created) symmetry-expanded set of the unbinned particles (to use for local refinement)?

So far the only way I can come up with is if I symmetry expand the unbinned particles first, and then downsample the symmetry expanded set and perform 3D-VA on that, but this takes up an unnecessarily large amount of disk space and scratch space, depending on the degree of downsampling and the degree of symmetry.

t.laughlin · September 30, 2023, 11:36pm

Are there identifying, corresponding fields in the two particle or respective passthrough files that can be used to subset the unbinned particles with cryosparc.dataset.Dataset.query()? Only recently started playing with cryosparc-tools, so I unfortunately don’t have a surefire strategy for this particular use case.

Is another alternative future strategy to generate the down-sampled images, symmetry expand the unbinned metadata, and then swap the .blob lower level input? Or, do the uids not get maintained properly this way?

olibclarke · September 30, 2023, 11:51pm

I’ve tried using the lower level inputs approach without success, but haven’t tried cryosparc-tools, that might work, thanks!

For now I am just using 3D-flex data prep on a symmetry-expanded full-scale input, and using the resulting cropped and downsampled particles for 3D-VA - then I can get reconstructions of subsets of the full scale particles by swapping the lower-level inputs. This works, but isn’t terribly space efficient, especially for higher symmetries.

rposert · October 5, 2023, 7:55pm

Hi @olibclarke! As you discovered, the low-level interface won’t work here because symmetry expansion assigns new UIDs to the particles. However, their original particle UID is still there, so we can do what you want using cryosparc-tools. Below is a gist to do it.

A few notes:

It is easiest (and what this gist expects) to symmetry expand both the full-size and downsampled particles. This is quick and doesn’t use much disk space, so that’s the approach we recommend.
This gist keeps all particle information from the full-size, symmetry-expanded stack. In other words, it is just used to select particles that you have filtered in another pipeline. If you need poses from that second pipeline you will need to add a bit more code. I’m happy to help if that’s the case

gist.github.com

https://gist.github.com/PlethoraChutney/269c75ceebf632d028a5e8734545b513

revert_symmexp_particles.py

from cryosparc.tools import CryoSPARC
import numpy as np

cs = CryoSPARC(
  # you'll need env variables here, or just set stuff directly
)
cs.test_connection()

# we will filter fullsize_particles so that only the intersection with
# downsampled particles are retained. ctf, alignments, etc. will all

This file has been truncated. show original

Please do let me know if you run into any issues!

olibclarke · March 24, 2024, 3:04pm

Hi Rich,

Thanks for this script! Would it be possible to add the “source uid” as opposed to the “current uid” as an option to intersect on in the particle sets tool?

This would allow easy identification of symmetry expanded copies that derive from the same original particle. It would also be great if the Particle Sets tool were able to take an arbitrary number of input particle sets, rather than just two - so that one could select the mutual intersection of e.g. 3 or 4 sets of particles, rather than just two!

Cheers
Oli

rposert · March 25, 2024, 5:25pm

Hi @olibclarke! Glad it was helpful! I’ve recorded intersecting on source UID as a feature request .

For your second, I think we’re getting into CryoSPARC tools territory. For example, this script will find the intersection of particle outputs from any number of jobs:

from cryosparc.tools import CryoSPARC
import json
from pathlib import Path
with open(Path('~/instance-info.json').expanduser(), 'r') as f:
    instance_info = json.load(f)

cs = CryoSPARC(**instance_info)
assert cs.test_connection()

project_number = "P337"
project = cs.find_project(project_number)

job_ids_outputs = {
    # key: value pair of Job ID: output name
    "J18": "split_0",
    "J19": "split_0",
    "J20": "split_0",
}
particle_datasets = [
    project.find_job(job_id).load_output(output)
    for job_id, output in job_ids_outputs.items()
]

uid_sets = [set(p['uid']) for p in particle_datasets]
uid_intersection = set.intersection(*uid_sets)

particles_in_all_sets = particle_datasets[0].query(
    {'uid': list(uid_intersection)}
)

which can then be saved in the usual way with project.save_external_result()