Hi, all! I tested topaz method to pick particles. After 2D classification, I found out there are too many false positive particles in the “select 2D classes” job(Good classes: have clear secondary structure in 2D; “false positive” classes: have a little 2D shape but obscure). Particles in 2D good classes (7000 particles) are less than those picked by “Template picking” jobs (18000 particles). The reconstruction of 18000 particles could reach a resolution of 4.2 angstroms while the 7000 particles’ one only reach 6-7angstroms. In one hand, I thought that my input train data had some bad particles. The train data was selected by “Template picking - 2D classification - select”. So I was not sure whether every particle had good quality. In another hand, I thought the quality of micrgraphs was just soso. And I thought manual-picking thousands of fairly good particles would also meet a big problem. And if I need to manualpick to provide a high quality train dataset, how many particles are needed? how many particles per micrographs are needed? Or do someone have other advice?
Based on recommendations from @alexjamesnoble, we have been manually picking 1000-2000 particles for the training set, and while laborious, this does seem to give very good results with Topaz.
To make it easier, we usually do an exploratory run with template or blob picker first, followed by extensive 2D classification, and then sort the resulting mics in Curate Exposures by the number of remaining particles. This allows us to easily select the most efficient group of micrographs to use for manual picking & training (the ones with the most good particles remaining after classification)
Thanks for your patient and very fast reply! I will try it!
Hi, @olibclarke :
Another question: I wanted to know whether I should manual-pick all (or almost all) particles in one micrographs. because I was afraid that the missing particles would affect the AI model. I wanted to know whether some missing particles in micrographs would affect the quality of deep learning model.
No, incomplete picking is fine.
The important thing is to make sure the manual picking is clean (high confidence).
And the “estimated number of particles” parameter should correspond to the estimated average number of true particles per micrograph in the training set (not the number picked for training, and not the average number in the entire dataset).
To add to Oli’s great advice, you can use Topaz Cross-Validation where you vary the ‘estimated number of particles’ parameter to help determine the correct value. Then if you are getting good results, I suggest training for longer than the default number of epochs (10). Try 30 epochs or so.