Questions about appropriate batchsize

Hi all,

I read on the forum that increasing the batchsize to 200-400 can give better results for small particles. My particles are relatively small and I would like to increase the batchsize. But I’m not sure if it is appropriate to set batchsize to 400 if the total number of particles is small. For example:

if I only have 20k particles and I want to use 200 classes, so the average number of particles in each 2D class would be 100particle/class, in this case, is it appropriate to use batchsize of 200 or even 400?

I’d appreciate any suggestion. Thank you so much.

Best,
Meiling

1 Like

the number of particles goes into each iteration of 2d classification is the number of classes * the batchsize. so large batchszie will make the classes look better.
I would use 50 classes for 20k particles

Thank you @zhenyu_tan for the reply. That makes sense. So the batchsize should be less than (total number of particles)/(number of classes), is that correct?

Also, does the program use the same set of particles for each 2D iteration or it uses different sets of particles for different iterations?

Thank you so much again!

Best,
Meiling

the way it works based on my understanding is:
it will first randomly sort the particle stack. Then at each iteration it will feed a non overlapping particles subset(number of classes * the batchsize) for classification. At the last iteration, it will go through the whole particle stack once to classify all of them.
if the number of classes * the batchsize is greater than the particle number, I think it will use the full dataset at each iteration.

3 Likes

Thank you so much @zhenyu_tan for explaining this to me.

Hi all,

Just to add onto @zhenyu_tan’s reply, I believe similar topics were discussed in this thread.

1 Like

That’s really helpful as well. Thank you @mmclean !

With only 20k particles, 20 classes will probably give a much nicer result than 200. For single-particle cryo-EM, ~1000 particles per class is usually appropriate.

1 Like