Assertion check for particle stack size fails on multi import 2D job

yoshiokc · January 25, 2023, 4:49am

Hitting this bug in streaming 2D classification:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/class2D/run_streaming.py", line 133, in cryosparc_compute.jobs.class2D.run_streaming.run_class_2D_streaming
  File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 424, in dump_particles
    psets = [ import_particles_database_to_dataset(session_doc, exposure_doc, exposure_doc['picker_type'], proj_dir_abs) for exposure_doc in exposures ]
  File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 424, in <listcomp>
    psets = [ import_particles_database_to_dataset(session_doc, exposure_doc, exposure_doc['picker_type'], proj_dir_abs) for exposure_doc in exposures ]
  File "/pncc/sw/cryosparc/cryosparc_worker/cryosparc_compute/jobs/rtp_workers/rtp_common.py", line 267, in import_particles_database_to_dataset
    assert len(dset) == group_data['count'], "Dataset was the wrong length! Expected %d got %d, Exposure UID: %s" % (len(dset), group_data['count'], str(exposure_doc['uid']))
AssertionError: Dataset was the wrong length! Expected 315 got 305, Exposure UID: 1

I think the particle stack isn’t matching what is expected in this assertion, not because of file corruption, but because the original filename is being used as a hash, and there are duplicate files with the same name from merging movies under different directories.

stephan · January 31, 2023, 5:25pm

Hey @yoshiokc,

Can you explain this part a bit more? How did you merge movies from different directories?

yoshiokc · March 7, 2023, 8:57pm

included more than one exposure group in the same CS Live session, but they have many identical filenames (though not paths).