What is the 3rd dimension in the mrc files that are generated by the Extract from Micrographs job?

ray.berkeley · October 7, 2021, 11:09pm

Hi all,

I’m interested in extracting images of individual picked particles from a 2D classification job. I’m able to access particle location by parsing the .cs file generated by a Select 2D Classes job like so:

from cryosparc_compute import dataset
import mrcfile

# This is the path to the cs file generated by the Select 2D Particles job
cs_path = "path/to/exports/P2_J36_particles_selected_exported.cs"
particle_dataset = dataset.Dataset().from_file(cs_path)

# This points to the mrc file that cointains the particle in the cs file above
# This looks at the 5th particle in the cs file, the choice of 5 is arbitrary
particles_path = f"/path/to/data/{particle_dataset.data['blob/path'][5][1:]}"

with mrcfile.open(particle_path) as mrc:
    stack = mrc.data

This gives me an object called stack that represents the data in the mrc file. It is a 3D numpy array, with the second and third dimensions matching up with my box size (360 px, Fourier-cropped to 90 px). What, though, is the first dimension?

stack.shape # Output: (144, 90, 90)

Most of the mrc files that I’ve looked at generate 3D arrays that have 150-200 objects in that dimension, that I assume represent individual particles. If I plot any slice of the stack object along that first dimension (0-144), I get something like the following:

Taking the mean of the stack along that dimension (np.mean(stack, axis = 0)) suggests that each of these might be an individual particle (there is some density in the middle of the image). I also know that there is an index field in the original .cs file ( particle_dataset.data['blob/idx']). Is this index in that first dimension in the 3D numpy array my particle of interest? If so, why are the .mrc files bundled this way, with a seemingly random selection of particles in a single .mrc file?

Thanks!

boggild · October 8, 2021, 4:21pm

I would assume each particle stack contains all the particles extracted from 1 movie. Do you have about 150-200 particles per micrograph?

ray.berkeley · October 8, 2021, 6:26pm

Thanks, yes, I think this is correct.

vperetroukhin · October 15, 2021, 1:02pm

Hi @ray.berkeley,

The first dimension in the NumPy array does indeed span the particles within the stack. As @boggild correctly pointed out, each .mrc file bundles together particles from individual movies you’ve imported. These files are typically outputted by the ‘Extract From Micrographs’ job.

Note that the particles within these .mrc files will be a superset of those you’ve selected with the ‘Select 2D’ job. To access the raw data associated with only the selected particles, you could do the following:

from cryosparc_compute.jobs import runcommon as rc
from cryosparc_compute.particles import ParticleStack

# Load the dataset from a Select 2D job (or read in an exported .cs file)
particle_dataset = rc.load_output_group_direct('P167', 'J18', 'particles_selected')

#Create a particle stack and populate
particles = ParticleStack()
particles.init(particle_dataset)

# Load in the raw particle data
print(f'Reading blobs for a ParticleStack with {len(particle_dataset)} items...')
project_path = '/path/to/P167'
particles.read_blobs(project_path)

#Load a list of Particle objects
particle_spooling_list = particles.get_items()

#Access the raw data
particle_data = particle_spooling_list[0].get_original_real_data() #Get first image

Hope that’s useful!

Valentin

ray.berkeley · December 20, 2021, 2:05am

This is very useful. Thanks!