Output particles in "intermediates" mode of 3D-variability display?

Hi,

In the release notes for 2.12, it is stated that the “intermediates” mode of 3D-variability display should output particle subsets for further refinement. It doesn’t right now (though cluster mode does) - would it be possible to add?

Also, it is currently not possible to set the “rolling window” in this mode to 0, which would be useful in order to allow for evenly splitting up particles along a given component without overlap between groups.

Cheers
Oli
image

Hi @olibclarke v2.13 now allows for width=0 to use tophat (i.e. non-overlapping) windows now, but the job still doesn’t output the particle subsets like it was supposed to… this will be added.

In a pinch this can be done by manually creating subsets of the particles.cs file from the 3D var job output, and then importing those using the Import Result Group job.

Thanks @apunjani - I’m still not completely clear on how to “manually create subsets of the particles.cs file from the 3D var job output” - sorry for being dim, do you have an easy to follow guide to this?

Cheers
Oli

1 Like

I was able to do it, though it was sort of tricky to figure out. I have a 1E6 particle dataset that’s going to 2.4A, I tried to take the 2**18 (262k) nearest neighbors of the centroid from the embedding and refine those, but it actually got worse. It looks like a big Gaussian so I have to think some more about why this didn’t work.

I can share a script and details if you don’t want to wait for the next update.

Thanks Daniel, a script would be very helpful! In this case I just want to select particles in a selected amplitude range along one eigenvector - more for classification of different confs than improvement of res

Cheers
Oli

Here’s what I did, in your case you can just use idx = (x > lo) & (x < hi) to index the cs and pt array instead of the whole KNN thing I’m doing.

import numpy as np
cs = np.load("P16/J205/cryosparc_P16_J205_particles.cs")
pt = np.load("P16/J205/P16_J205_passthrough_particles.cs")
x, y, z = cs['components_mode_0/value'], cs['components_mode_1/value'], cs['components_mode_2/value']
X = np.array([x, y, z])
mu = np.mean(X, axis=1)
vsmu = sp.spatial.distance.cdist(np.atleast_2d(mu), X.T)
knn_2p18 = np.argsort(vsmu)[0, :2**18]
np.save("P16_J205_particles_muknn_2p18.cs", cs[knn_2p18], allow_pickle=True, fix_imports=True)
np.save("P16_J205_passthrough_particles_muknn_2p18.cs", pt[knn_2p18], allow_pickle=True, fix_imports=True)

Then cp P16_J205_particles.csg P16_J205_particles_muknn_2p18.csg and edit the new file to point to your new .cs files. You can now import this .csg file as a results group.

2 Likes

Fantastic - thank you!

No worries! It took me several tries to get the .csg file edited right, so go back to that if it fails the first time. I just noticed the greater-thans were swapped above and edited that, so also double check those if you copy-and-pasted.

I wanted to follow up on this thread as I’ve been having the same problem. Have you guys found a clear solution to extracting particles that relate to each intermediate? I had some trouble with the script and was wondering if there has been a different way to approach this issue.

With the .csg files? It’s far easier at the moment to filter and re-import a converted .star file than this .csg thing.

from pyem import star
df = star.parse_star("file.star")
star.write_star("file.star", df.loc[<some expression of cs['components_mode_0/value']>])

@apunjani also following up on your 3DVar preprint, would you mind sharing some more details on your 1D VAE embedding, especially the loss function? I’m trying the same approach and I get pretty different results with RMS error alone vs. RMS error + KL divergence.

1 Like

Hi @DanielAsarnow,

We’ll make better docs for using .cs and .csg files soon :slight_smile:

Regarding the VAE: good questions. I mostly did this as a proof of principle to show/easy visualization that the 3DVA embedding captures many discrete clusters, even more than the number of 3DVA dimensions in this case. I wouldn’t recommend it generally for actually separating clusters - if they really are separable, the GMM operating directly in reaction coordinate space works better (that’s what cluster mode of the 3D Var Display job does). But for more details:

  • I also found that tweaking the relative weighting between the RMS error and KL divergence of the VAE latent space prior made a big difference in the results. In this case it’s a low dim latent space (1 or 2 dims) and we don’t really want the VAE to learn to spread out clusters so that the latent space is “full”/approaching a Gaussian distribution. We actually want the distribution of particles in the latent space to be as multimodal as necessary rather than unimodal. So I set the relative weight of the KL term to 0.01, and it mostly just serves to keep the VAE latent embeddings bounded
  • Given the amount of data and very low dim latent space, omitting the KL term entirely also gave good clustering of the conformations but runaway values of the latent embeddings

Here’s the network architecture I used (super simple):

class VAE(nn.Module):
    def __init__(self):
        super(VAE, self).__init__()
        
        dim = n_components
        D = 1024
        
        self.fc1 = nn.Linear(dim, D)
        self.fc2_mu = nn.Linear(D, 1)
        self.fc2_logvar = nn.Linear(D, 1)
        self.fc3 = nn.Linear(1, D)
        self.fc4 = nn.Linear(D, dim)
        
    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc2_mu(h1), self.fc2_logvar(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std
    
    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return self.fc4(h3)
    
    def forward(self, x):
        mu, logvar = self.encode(x.view(-1,n_components))
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar    

Trained with ADAM with learning rate 1e-2, training time batch size of 1024

It’s not too sensitive to parameters, but each run does result in the clusters being arranged in a different order on the 1D embedding.

1 Like

@apunjani Thanks a ton, I’ll let you know how it goes for me with tweaks based on what you posted. The reason I’m still interested in this approach is that my cases still have a lot of outlier particles that interfere with GMM-type clustering. I’ll also try a density-based approach (probably OPTICSXi, not DBSCAN because the cluster densities are not constant).

@apunjani Another couple of 3DVar questions.

  1. Can I make a model for a hypothetical particle particle simply by multiplying the component values times the component volumes, summing them, and adding them to the map from the 3DVar job?
  2. Do the means of the component values give the relative weight of the components? Is there a way to calculate the %variance explained by each component?

Update

Answer to 1) is yes - the formula is inded V0 + Zi * Vi for each component i just as in the imaging model from the 3DVar preprint equation 4 (the part in parens).

I’m not quite sure about 2). The magnitude of the components seems to be the variance of the component values, but I’m not sure how to find the total variance.

Hi @apunjani, the intermediates mode of 3D-VA still doesn’t seem to output particle subsets in 3.2, would it be possible to add this?

Cheers
Oli

3 Likes

Hey @olibclarke,

This is now implemented in cryoSPARC v3.3! The 3DVA Display job includes an advanced parameter called Intermediates: output particle subsets which will turn on output of particles in intermediate states along one 3DVA component. We decided to keep it to a single user-specified component to keep the number of outputs manageable. Thanks for the suggestion!

We’ve also added a quick note to 3DVA tutorial (part two) to reflect this.

Valentin

2 Likes

Hi @vperetroukhin,

In my 3DVA Display job (cryoSPARC v3.3.1) with Output mode of intermediates, I enabled “Intermediates: output particle subsets” and set 1 for “Intermediates: output particle component”. However, I could not see “output particle subsets”. All other parameters are in default. Did I miss something? Thanks!

Here is the partial log:

Compiling job outputs…
[CPU: 7.27 GB] Passing through outputs for output group particles_series_1_frame_0 from input group particles
[CPU: 7.27 GB] This job outputted results [‘blob’]
[CPU: 7.27 GB] Loaded output dset with 0 items
[CPU: 7.27 GB] Passthrough results [‘ctf’, ‘alignments3D’, ‘components_mode_0’, ‘components_mode_1’, ‘components_mode_2’, ‘components_mode_3’, ‘components_mode_4’, ‘components_mode_5’, ‘components_mode_6’]
[CPU: 11.93 GB] Loaded passthrough dset with 742501 items
[CPU: 10.49 GB] Intersection of output and passthrough has 0 items
[CPU: 10.49 GB] Passing through outputs for output group particles_series_1_frame_1 from input group particles
[CPU: 10.49 GB] This job outputted results [‘blob’]
[CPU: 10.49 GB] Loaded output dset with 0 items
[CPU: 10.49 GB] Passthrough results [‘ctf’, ‘alignments3D’, ‘components_mode_0’, ‘components_mode_1’, ‘components_mode_2’, ‘components_mode_3’, ‘components_mode_4’, ‘components_mode_5’, ‘components_mode_6’]
[CPU: 11.93 GB] Loaded passthrough dset with 742501 items
[CPU: 10.49 GB] Intersection of output and passthrough has 0 items
[CPU: 10.49 GB] Passing through outputs for output group particles_series_1_frame_2 from input group particles
[CPU: 10.49 GB] This job outputted results [‘blob’]
[CPU: 10.49 GB] Loaded output dset with 0 items
[CPU: 10.49 GB] Passthrough results [‘ctf’, ‘alignments3D’, ‘components_mode_0’, ‘components_mode_1’, ‘components_mode_2’, ‘components_mode_3’, ‘components_mode_4’, ‘components_mode_5’, ‘components_mode_6’]
[CPU: 11.93 GB] Loaded passthrough dset with 742501 items
[CPU: 10.49 GB] Intersection of output and passthrough has 0 items
[CPU: 10.49 GB] Passing through outputs for output group particles_series_1_frame_3 from input group particles
[CPU: 10.49 GB] This job outputted results [‘blob’]
[CPU: 10.49 GB] Loaded output dset with 0 items
[CPU: 10.49 GB] Passthrough results [‘ctf’, ‘alignments3D’, ‘components_mode_0’, ‘components_mode_1’, ‘components_mode_2’, ‘components_mode_3’, ‘components_mode_4’, ‘components_mode_5’, ‘components_mode_6’]
[CPU: 11.94 GB] Loaded passthrough dset with 742501 items
[CPU: 10.49 GB] Intersection of output and passthrough has 0 items
[CPU: 10.49 GB] Passing through outputs for output group particles_series_1_frame_4 from input group particles
[CPU: 10.49 GB] This job outputted results [‘blob’]
[CPU: 10.49 GB] Loaded output dset with 0 items
[CPU: 10.49 GB] Passthrough results [‘ctf’, ‘alignments3D’, ‘components_mode_0’, ‘components_mode_1’, ‘components_mode_2’, ‘components_mode_3’, ‘components_mode_4’, ‘components_mode_5’, ‘components_mode_6’]
[CPU: 11.94 GB] Loaded passthrough dset with 742501 items
[CPU: 10.49 GB] Intersection of output and passthrough has 0 items
[CPU: 10.49 GB] Passing through outputs for output group particles_series_1_frame_5 from input group particles
[CPU: 10.49 GB] This job outputted results [‘blob’]
[CPU: 10.49 GB] Loaded output dset with 0 items
[CPU: 10.49 GB] Passthrough results [‘ctf’, ‘alignments3D’, ‘components_mode_0’, ‘components_mode_1’, ‘components_mode_2’, ‘components_mode_3’, ‘components_mode_4’, ‘components_mode_5’, ‘components_mode_6’]
[CPU: 11.93 GB] Loaded passthrough dset with 742501 items
[CPU: 10.49 GB] Intersection of output and passthrough has 0 items
[CPU: 10.49 GB] Passing through outputs for output group particles_series_1_frame_6 from input group particles
[CPU: 10.49 GB] This job outputted results [‘blob’]
[CPU: 10.49 GB] Loaded output dset with 0 items
[CPU: 11.94 GB] Loaded passthrough dset with 742501 items
[CPU: 10.49 GB] Intersection of output and passthrough has 0 items

[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_0
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_1
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_2
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_3
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_4
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_5
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_6
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_7
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_8
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_9
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_10
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_11
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_12
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_13
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_14
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_15
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_16
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_17
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_18
[CPU: 10.49 GB] Checking outputs for output group particles_series_1_frame_19
[CPU: 10.49 GB] Updating job size…
[CPU: 10.49 GB] Exporting job and creating csg files…
[CPU: 10.50 GB] ***************************************************************
[CPU: 10.50 GB] Job complete.

Hi @donghuachen,

Apologies for the late response on this. It looks like those frames have 0 particles which is why you’re not seeing anything outputted. Can you share the ‘Distribution of reaction coordinates mode 1’ figure?

Thanks!

@vperetroukhin
Here it is.
P1_J46_distribution_of_reaction_coordinates_mode_1

Interesting, that looks normal. Two more thoughts:

  1. Can you provide a screenshot of the parameters panel like the one below?

  2. What does the job output look at this stage?

[CPU: 7.00 GB] Reconstructing component 1 frame 0
[CPU: 7.03 GB] Reconstructing from 392135 images…
[CPU: 7.03 GB] batch 393 of 393
[CPU: 7.15 GB] Reconstructing component 1 frame 1
[CPU: 6.97 GB] Reconstructing from 554373 images…
[CPU: 6.97 GB] batch 555 of 555
[CPU: 7.10 GB] Reconstructing component 1 frame 2