Questions about appropriate batchsize

Meiling · April 2, 2021, 2:20pm

Hi all,

I read on the forum that increasing the batchsize to 200-400 can give better results for small particles. My particles are relatively small and I would like to increase the batchsize. But I’m not sure if it is appropriate to set batchsize to 400 if the total number of particles is small. For example:

if I only have 20k particles and I want to use 200 classes, so the average number of particles in each 2D class would be 100particle/class, in this case, is it appropriate to use batchsize of 200 or even 400?

I’d appreciate any suggestion. Thank you so much.

Best,
Meiling

zhenyu_tan · April 2, 2021, 5:29pm

the number of particles goes into each iteration of 2d classification is the number of classes * the batchsize. so large batchszie will make the classes look better.
I would use 50 classes for 20k particles

Meiling · April 2, 2021, 5:53pm

Thank you @zhenyu_tan for the reply. That makes sense. So the batchsize should be less than (total number of particles)/(number of classes), is that correct?

Also, does the program use the same set of particles for each 2D iteration or it uses different sets of particles for different iterations?

Thank you so much again!

Best,
Meiling

zhenyu_tan · April 2, 2021, 7:08pm

the way it works based on my understanding is:
it will first randomly sort the particle stack. Then at each iteration it will feed a non overlapping particles subset(number of classes * the batchsize) for classification. At the last iteration, it will go through the whole particle stack once to classify all of them.
if the number of classes * the batchsize is greater than the particle number, I think it will use the full dataset at each iteration.

Meiling · April 5, 2021, 5:40pm

Thank you so much @zhenyu_tan for explaining this to me.

mmclean · April 7, 2021, 1:38pm

Hi all,

Just to add onto @zhenyu_tan’s reply, I believe similar topics were discussed in this thread.

Meiling · April 16, 2021, 5:08pm

That’s really helpful as well. Thank you @mmclean !

DanielAsarnow · April 17, 2021, 8:49pm

With only 20k particles, 20 classes will probably give a much nicer result than 200. For single-particle cryo-EM, ~1000 particles per class is usually appropriate.