Picking / Classification of particles (too) close to each other

DarioSB · April 12, 2024, 10:43am

Hey all,

what would be your workflow if the first 2D classification of a data set results not only in good classes of single particles, but also classes that seem to have two particles very close to each other? Is there a way to split these classes into images with recentered particles? Should I use template picking using the good averages as templates and decrease the particle distance (min seperation distance for the blob picker was at 0.3 (particle diameter 120-200A), which I thought would be small enough)
Or should I ditch these images completely as the signals of both particles will interfer with each other in the alignment (in fourier space)?
P.S.: these are phase plate data recorded at <0.1µm defocus, hence no CTF correction

rbs_sci · April 12, 2024, 12:03pm

I’d abandon them.

Posts must be at least 20 characters.

olibclarke · April 12, 2024, 12:51pm

Use the single, well centered particles to train a topaz or crYOLO model, to improve your picking. Likely you have some miscentered picks - improving the picking in this instance could improve downstream results (and is often beneficial for crowded/aggregated samples and small particles, in my experience)

DarioSB · April 12, 2024, 1:03pm

great, thanks. how will this work if i have a probably orientation biased sample? could I try to make an ab intio and use that for picking? if i use the averages only I guess I will miss out all the other orientations

olibclarke · April 12, 2024, 1:55pm

I would try and see how you go. Quite often even if you have some orientation bias, there will be particles from other orientations in those good averages.

DarioSB · April 12, 2024, 2:14pm

With cryolo/topaz of these averages or from an ab initio?

olibclarke · April 12, 2024, 2:15pm

cryolo/topaz (post must be at least 20 characters)

DarioSB · April 12, 2024, 2:17pm

Any hint which one to use?

olibclarke · April 12, 2024, 2:26pm

Both are good, I use topaz more often but both will give good results with good training data

CryoEM2 · April 13, 2024, 5:10pm

In addition to better picking, or abandoning if you have a large dataset and they are a small fraction, I would select the doubles, re-run 2D with small window (65 or so). Iterate. They will become centered, and when they do, extract with recentering. If you redo 2D with recentering the problem could arise again (good that 2D provides diversity!), so can just skip 2D with these. A well-centered particle will align well to a nice 3D volume, even if there is a particle right next to it. And dense micrographs make for great CTFs and more data.

DarioSB · April 15, 2024, 11:12am

I tried to use topaz train but somehow I cannot get it to work. The job is running but is stuck at the following state (nothing changed over the weekend):

olibclarke · April 15, 2024, 11:58am

Which GPU are you using? This may be the culprit?

DarioSB · April 15, 2024, 1:17pm

I have two GTX1070 installed.
I used all standard settings. Maybe there is sth that speeds up things? and is there anything that I have to specify if the data are phase plate data?

P.S.: I tried the job on 10 micrographs and it worked, though it took 2h for it to finish and the extract / inspect jobs showed quite a large amount of not picket particles that I would have manually picked. I sed the nr of particles per mic to 300. But applying it to all 2500 mics would take 500h…

olibclarke · April 15, 2024, 1:33pm

Which bit took 2h? Training or extraction? You only need to do training once, and I would often train on only 50-100 mics with several thousand particles.

Also, what settings did you use? The default settings for computation often spawn a lot of subprocesses and lead to system lockup in my hands. I would recommend 2 threads & 2 processes as a starting point.

Regarding results, this will very much depend on the threshold used in Topaz Extract - may require some tweaking. Estimated number of particles per mic in training is also an important parameter to tune.

DarioSB · April 15, 2024, 1:40pm

Training took 2h.
I used the default settings.
I will try 100 mics
expectednr of particles=800
nr of parallel processes=2
nr of CPUs=2

olibclarke · April 15, 2024, 1:52pm

Ok if it is the training that took 2h that is fine - you only need to do this once, and you never need to do it on the whole dataset. Extraction should be faster - I would not extrapolate from training to extraction based on number of mics

DarioSB · April 15, 2024, 1:55pm

extraction was pretty fast afterwards, true