Hitting this bug in streaming 2D classification:
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
File "cryosparc_master/cryosparc_compute/jobs/class2D/run_streaming.py", line 133, in cryosparc_compute.jobs.class2D.run_streaming.run_class_2D_streaming
File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 424, in dump_particles
psets = [ import_particles_database_to_dataset(session_doc, exposure_doc, exposure_doc['picker_type'], proj_dir_abs) for exposure_doc in exposures ]
File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 424, in <listcomp>
psets = [ import_particles_database_to_dataset(session_doc, exposure_doc, exposure_doc['picker_type'], proj_dir_abs) for exposure_doc in exposures ]
File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 267, in import_particles_database_to_dataset
assert len(dset) == group_data['count'], "Dataset was the wrong length! Expected %d got %d, Exposure UID: %s" % (len(dset), group_data['count'], str(exposure_doc['uid']))
AssertionError: Dataset was the wrong length! Expected 315 got 305, Exposure UID: 1
I think the particle stack isn’t matching what is expected in this assertion, not because of file corruption, but because the original filename is being used as a hash, and there are duplicate files with the same name from merging movies under different directories.