I have processed a dataset (Krios, 300 kV, pixel size 0.728 Å/pixel) of a small asymmetric protein (expected ~50 to ~80 Å), blob picked and extracted with a box size of 288 px, binned to 72 px. After doing some initial junk cleaning and ab initio generation I used Heterogeneous Refinement at a Voxel box size of 64 voxels and generated good templates for template picking to retrieve ~5 mil particles.
After that I extracted the newly picked particles at a slightly smaller box size, 216 px binned to 64 px, because 288 px seemed a bit too large in the refinements. I’m trying to take these extracted particles directly into a new Heterogeneous Refinement job also at 64 Voxels using the Het refine volume that generated my templates (generated from the original 288 px extraction) as a “good” volume and five ab initio classes (generated from the new 216 px extraction) as “junk” volumes.
The problem I am running into, is that the good volume coming out of this second box-size Heterogeneous Refinement is now smaller than the initial input volume. I open them both in ChimeraX and where the original job output was ~ 76 x 58 x 38, the new output is now ~ 64 x 49 x 36 . I think it is smaller by a factor of about 1.2 - 1.3x and I am concerned that this will affect downstream processing because the sizes are not matching anymore. I’ve tried resampling the original volume to a variety of sizes using Volume Tools, but nothing is replicating this shrinking effect.
Can anyone advise how to fix this? These Het. Refine jobs are taking ~14 hours each to complete so I really would like to figure out the problem ASAP.
Template pick using templates generated from a good volume out of step 2
Extract new particles at 216 px → 64 px (final pixel size 2.45 Å)
Create a Heterogeneous Refinement job with the particles from step 4, a good volume from step 2, and several junk volumes from an ab initio of particles from step 4
And the good volume coming out of step five is a different physical size than the good volume going into step 5.
If I have all that correct, I wonder if you could share a few things with me so I can help figure out what’s going on.
Could you please share both of the messages ChimeraX puts in the log when you load the small and large volumes? They will look something like this:
Opened cryosparc_P345_J186_007_volume_map.mrc as #1, grid size 256,256,256, pixel 1.02, shown at level 0.0728, step 1, values float32
Could you please share images of the large and small volume side by side, with the camera set to ortho mode? You could do this by opening the small and large volumes, then running:
vol sdlevel 6; camera ortho; tile; save ~/Desktop/side-by-side.png
You may have to adjust vol sdlevel 6 to a different value, but the idea is to show both maps at more-or-less the same threshold level.
Hi @rt18, one of my colleagues pointed something out to me that I was not aware of before. Please ignore my requests above.
Heterogeneous Refinement scales input volumes to the refinement box size without taking into account their physical extent. In your case, this means the good input volume (with a physical extent of 72 px x 2.91 Å = 209.52 Å) was scaled to 64 voxels, meaning the final pixel size is 209.52 Å / 64 = 3.27 Å, but is treated as if it has a pixel size which matches your particles (2.54 Å). This means the volume is displayed smaller than it really is.
You can resolve this problem by ensuring all of the inputs to Heterogeneous Refinement have the same physical box size. In your case, you can crop the volume from your first extraction using Volume Tools. You should
Crop the volume from the first extraction by setting the Crop to box size (pix) parameter (not Resample) to 64 * 2.45 / 2.91 = 54 pixels. This creates a volume which has (approximately) the correct physical extent given its larger pixel size.
Take the output from this first job and attach it to a second Volume Tools job, this time leaving the crop parameter empty and setting Resample to box size (pix) to 64. This creates a volume with the same physical extent and pixel size as your volumes from the second extraction.
I also wonder if you could share with us a few things about your Heterogeneous Refinement jobs? 14 hours is slower than we’d expect, and there might be some parameter changes we can recommend to speed things up.
Are you refining all 5 million particles?
You’re providing 6 classes total (1 good 5 junk), correct?
What custom parameter values are you setting?
What GPUs are you using?
Does the job run for 14 hours, or are you including time the job spends waiting for resources?
Thank you for the rapid response! I will run those volume tools jobs and report back once it has run (it will probably take a few hours for my worker node to be freed up from maintenance).
For the Heterogeneous Refinement jobs I am providing all ~5 million particles to try and separate out the initial template-picked junk from good particles since I don’t want to use 2d classification and potentially throw away heterogeneity / rare views with the small and apparently flexible protein. I provide one good class and five random-noise junk classes, 6 total classes.
Refinement box size set to 64 voxels, Force hard classification True, Batch size per class turned to 5000. Those two may be the culprit behind how slow these jobs run but it was on recommendation for the small protein. I will definitely take any suggestions you have. I also have a spherical mask diameter set at 150 Å.
I do not know what GPUs our cluster uses, I’ll have to email the server admin. I am limited to using two GPUs at most.
The jobs do run for upwards of 24 hours (as I see in the initial Het. refinement of only 3 mil. particles to generate the good volume) with these parameters set and that is 24 hours of active runtime between Checkpoint 1 and the final checkpoint.
Hi @rt18, do let us know if that solves the problem of the volumes being different sizes!
As for your workflow, we have a few recommendations that may help speed things up.
Perform some particle curation before 3D
Although you’re right that in general we do recommend that as much curation is done in 3D as is feasible, we generally recommend that datasets this large do at least some filtering before the 3D phase. Some suggestions:
Pick on denoised micrographs and use the Micrograph Junk Detector. In general, picks from denoised micrographs are much cleaner than from raw micrographs, and the junk detector will remove particles before you have to extract them.
Use Inspect Particle Picks to remove obvious junk. If you are using the denoiser, you can try the Auto Cluster mode.
Perform one round of 2D classification, throwing out only the worst of the classes. Essentially, I recommend keeping any class that clearly has something in the class average, but throwing away classes which are just empty ice or carbon edge, etc.
Heterogeneous Refinement
Are you using an SSD cache and, if not, is one available on your cluster? In general we expect that SSD caches significantly improve the performance of jobs like Heterogeneous Refinement.
We also recommend reducing the batchsize to 2,000. We have not seen much improvement with batchsizes larger than this.
While this will not improve your speed, you may want to consider turning off hard classification. This allows classes to look more like your target, which in turn can help pull more junk particles out of the “good” class. The particle outputs are always hard classified, so downstream jobs will still only have particles that best match the class(es) you select.
Hi @rwaldo! I passed the good volume through two Volume Tools jobs as recommended (crop to 54 px then re-box to 64 px) and as a test began the second heterogeneous refinement round. The output volume from Iteration 0 that corresponds to that corrected volume is still slightly smaller, but I think it could be due to the lowpass filter or the threshold I am viewing at now rather than the actual volume being an incorrect size.
In terms of micrographs I unfortunately do not have access to the raw movie files; they were motion-corrected in CryoSPARC Live during collection and my collaborator only sent the motion-corrected .mrc files. I have been under the impression that I will not be able to perform denoising without the original movies.
I will try some 2D classification and take your recommendations for refinement into account! Thanks again!
Hi @rt18, glad the new volume looks about right! You’re right that you need at least a few of your movies to train a denoiser model. In our hands, the denoiser significantly improves particle picking and curation performance – we recommend it for every project if at all possible.
You may want to consider asking your collaborator to send a hundred or so of the original movies for you to re-motion correct and use as training data. You could then use the trained model to denoise all of your micrographs, even the ones for which you do not have movies. We have more information in the denoiser’s guide page if you’re interested!