Hi @DanielAsarnow,

We’ll make better docs for using `.cs`

and `.csg`

files soon

Regarding the VAE: good questions. I mostly did this as a proof of principle to show/easy visualization that the 3DVA embedding captures many discrete clusters, even more than the number of 3DVA dimensions in this case. I wouldn’t recommend it generally for actually separating clusters - if they really are separable, the GMM operating directly in reaction coordinate space works better (that’s what `cluster`

mode of the 3D Var Display job does). But for more details:

- I also found that tweaking the relative weighting between the RMS error and KL divergence of the VAE latent space prior made a big difference in the results. In this case it’s a low dim latent space (1 or 2 dims) and we don’t really want the VAE to learn to spread out clusters so that the latent space is “full”/approaching a Gaussian distribution. We actually want the distribution of particles in the latent space to be as multimodal as necessary rather than unimodal. So I set the relative weight of the KL term to 0.01, and it mostly just serves to keep the VAE latent embeddings bounded
- Given the amount of data and very low dim latent space, omitting the KL term entirely also gave good clustering of the conformations but runaway values of the latent embeddings

Here’s the network architecture I used (super simple):

```
class VAE(nn.Module):
def __init__(self):
super(VAE, self).__init__()
dim = n_components
D = 1024
self.fc1 = nn.Linear(dim, D)
self.fc2_mu = nn.Linear(D, 1)
self.fc2_logvar = nn.Linear(D, 1)
self.fc3 = nn.Linear(1, D)
self.fc4 = nn.Linear(D, dim)
def encode(self, x):
h1 = F.relu(self.fc1(x))
return self.fc2_mu(h1), self.fc2_logvar(h1)
def reparameterize(self, mu, logvar):
std = torch.exp(0.5*logvar)
eps = torch.randn_like(std)
return mu + eps*std
def decode(self, z):
h3 = F.relu(self.fc3(z))
return self.fc4(h3)
def forward(self, x):
mu, logvar = self.encode(x.view(-1,n_components))
z = self.reparameterize(mu, logvar)
return self.decode(z), mu, logvar
```

Trained with ADAM with learning rate `1e-2`

, training time batch size of 1024

It’s not too sensitive to parameters, but each run does result in the clusters being arranged in a different order on the 1D embedding.