Yes - box size way too small, and I would also use many more 2D classes than what you seem to here.
I would suggest initially binning to ~4Å per pixel, and then re-extracting after 2D/3D cleanup. I would also suggest the use of Patch CTF, which in my experience does a better job than Gctf.
Also why are you using so few micrographs? The fact you are using less than half the micrographs, but are ending up with ~3/4 the particles would suggest you haven’t cleaned up enough yet.
You are also importing the aligned sums, rather than the movies from what I can see, so per particle motion correction is not available to you, making it difficult to compare to the published structure which was obtained after polishing.
Good luck, looks like a good dataset to practice on, hope this helps!