Assertion check for particle stack size fails on multi import 2D job

Hitting this bug in streaming 2D classification:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/class2D/run_streaming.py", line 133, in cryosparc_compute.jobs.class2D.run_streaming.run_class_2D_streaming
  File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 424, in dump_particles
    psets = [ import_particles_database_to_dataset(session_doc, exposure_doc, exposure_doc['picker_type'], proj_dir_abs) for exposure_doc in exposures ]
  File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 424, in <listcomp>
    psets = [ import_particles_database_to_dataset(session_doc, exposure_doc, exposure_doc['picker_type'], proj_dir_abs) for exposure_doc in exposures ]
  File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 267, in import_particles_database_to_dataset
    assert len(dset) == group_data['count'], "Dataset was the wrong length! Expected %d got %d, Exposure UID: %s" % (len(dset), group_data['count'], str(exposure_doc['uid']))
AssertionError: Dataset was the wrong length! Expected 315 got 305, Exposure UID: 1

I think the particle stack isn’t matching what is expected in this assertion, not because of file corruption, but because the original filename is being used as a hash, and there are duplicate files with the same name from merging movies under different directories.

Hey @yoshiokc,

Can you explain this part a bit more? How did you merge movies from different directories?

included more than one exposure group in the same CS Live session, but they have many identical filenames (though not paths).