Extensive Validation fails on EMPIAR 10025 in 5.0

bsobol · January 29, 2026, 12:55pm

I starded evaluating CS 5 and encountered the issue with the validation workflow:

Traceback (most recent call last):
  File "cli/run.py", line 105, in cli.run.run_job
  File "cli/run.py", line 210, in cli.run.run_job_function
  File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 111, in run_extensive_validation
    subjob = queue_extensive_workflow_subjob(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 473, in queue_extensive_workflow_subjob
    jobs.enqueue_job(subjob, lane_name=lane, hostname=hostname, gpus=gpus_to_schedule)
  File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/core/jobs.py", line 460, in enqueue_job
    validate_path_params(job)
  File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/core/jobs.py", line 541, in validate_path_params
    raise UnprocessableException(
models.error.UnprocessableException: Invalid path specified for Micrographs data path: <redacted>/empiar_10025_subset_v1/mrc/*.mrc; directory allowed: False; file allowed: True; glob allowed: True

The job was set to download the dataset by itself, which it did. After succesfully extracting the archive with data, it complaind about invalid path. In fact there is no mrc subdirectory in empiar_10025_subset_v1.

This problem does not occur with EMPIAR 10305.

wtempel · January 29, 2026, 3:02pm

Thanks @bsobol for reporting this observation. Please can you post the outputs of these commands (please redact confidential components of data paths):

cspid=P99
csjid=J199
cryosparcm cli "api.jobs.find_one('$cspid', '$csjid').params"
cryosparcm job events $cspid $csjid | tail -n 60

bsobol · January 29, 2026, 3:31pm

Another observation: it happens only with “Run Advnced Jobs” enabled:

$ cryosparcm cli "api.jobs.find_one('$cspid', '$csjid').params"
{"dataset_selected": "10025", "dataset_data_dir": "<...>/empiar_10025_subset_v1", "scheduling_mode": "testing", "run_advanced_jobs": true, "workflow_extract_box_size": 448, "workflow_extract_bin_size_small": 256, "workflow_refine_symmetry": "D7", "workflow_refine_N": 256, "workflow_abinit_num_init_iters": 200, "workflow_abinit_num_final_iters": 300, "workflow_multirefine_batch_size_per_class": 1000, "workflow_refmotion_hyperopt_rmax": 10.0, "workflow_refmotion_hyperopt_target_particles": 12500, "workflow_refmotion_dose_target_particles": 20000, "compute_use_ssd": true, "compute_num_gpus": 1, "resource_selection": "athena-plgrid-bigmem-6h::", "send_data": false, "random_seed": 12345, "random_seed_default": 12345}

wtempel · January 29, 2026, 3:42pm

Thanks @bsobol for the additional data points.
~~Did you edit the Path to Dataset Data parameter before queuing the Extensive Validation job?~~
[update] we were able to replicate models.error.UnprocessableException after enabling Run Advanced Jobs and will investigate.

wtempel · February 14, 2026, 12:11am

@bsobol The empiar-10025-subset data package downloaded within the earlier extensive validation run was outdated. We have now updated the package. Please delete the empiar_10025_subset_v1/ subdirectory from your project directory. When you re-run extensive validation, the updated data package should be automatically downloaded.

bsobol · February 14, 2026, 11:27am

When autodownloading the dataset by the Extensive validation job, I get the following error

Traceback (most recent call last):
  File "cli/run.py", line 106, in cli.run.run_job
  File "cli/run.py", line 211, in cli.run.run_job_function
  File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 44, in run_extensive_validation
    dataset_data_dir = get_benchmark_dir(rc, dataset_selected, params.dataset_data_dir, params.run_advanced_jobs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 254, in get_benchmark_dir
    benchmark_data_dir = benchmarks.download_benchmark_test_data(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/core/benchmarks.py", line 111, in download_benchmark_test_data
    tarball_path = download_and_verify_url(
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/core/benchmarks.py", line 184, in download_and_verify_url
    assert crypt.verify_sha256(dest, checksum_sha256), f"Could not verify checksum for `{dest}` (from `{url}`)"
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Could not verify checksum for `<redacted>/empiar_10025_subset_v1.tar` (from `https://s3.wasabisys.com/cryosparc-test-data-dist/empiar_10025_subset_v1.tar`)

however after extracting the archive manually, everything seems to work fine.

wtempel · March 16, 2026, 8:24pm

CryoSPARC v5.0.3, released today, includes a change related to this issue.