I starded evaluating CS 5 and encountered the issue with the validation workflow:
Traceback (most recent call last):
File "cli/run.py", line 105, in cli.run.run_job
File "cli/run.py", line 210, in cli.run.run_job_function
File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 111, in run_extensive_validation
subjob = queue_extensive_workflow_subjob(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 473, in queue_extensive_workflow_subjob
jobs.enqueue_job(subjob, lane_name=lane, hostname=hostname, gpus=gpus_to_schedule)
File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/core/jobs.py", line 460, in enqueue_job
validate_path_params(job)
File "/net/software/v1/software/cryoSPARC/5.0.0/cryosparc_worker/core/jobs.py", line 541, in validate_path_params
raise UnprocessableException(
models.error.UnprocessableException: Invalid path specified for Micrographs data path: <redacted>/empiar_10025_subset_v1/mrc/*.mrc; directory allowed: False; file allowed: True; glob allowed: True
The job was set to download the dataset by itself, which it did. After succesfully extracting the archive with data, it complaind about invalid path. In fact there is no mrc subdirectory in empiar_10025_subset_v1.
Thanks @bsobol for reporting this observation. Please can you post the outputs of these commands (please redact confidential components of data paths):
Thanks @bsobol for the additional data points. Did you edit the Path to Dataset Data parameter before queuing the Extensive Validation job?
[update] we were able to replicate models.error.UnprocessableException after enabling Run Advanced Jobs and will investigate.
@bsobol The empiar-10025-subset data package downloaded within the earlier extensive validation run was outdated. We have now updated the package. Please delete the empiar_10025_subset_v1/ subdirectory from your project directory. When you re-run extensive validation, the updated data package should be automatically downloaded.
When autodownloading the dataset by the Extensive validation job, I get the following error
Traceback (most recent call last):
File "cli/run.py", line 106, in cli.run.run_job
File "cli/run.py", line 211, in cli.run.run_job_function
File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 44, in run_extensive_validation
dataset_data_dir = get_benchmark_dir(rc, dataset_selected, params.dataset_data_dir, params.run_advanced_jobs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/compute/jobs/workflows/run_extensive_validation.py", line 254, in get_benchmark_dir
benchmark_data_dir = benchmarks.download_benchmark_test_data(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/core/benchmarks.py", line 111, in download_benchmark_test_data
tarball_path = download_and_verify_url(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/net/software/v1/software/cryoSPARC/5.0.2/cryosparc_worker/core/benchmarks.py", line 184, in download_and_verify_url
assert crypt.verify_sha256(dest, checksum_sha256), f"Could not verify checksum for `{dest}` (from `{url}`)"
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Could not verify checksum for `<redacted>/empiar_10025_subset_v1.tar` (from `https://s3.wasabisys.com/cryosparc-test-data-dist/empiar_10025_subset_v1.tar`)
however after extracting the archive manually, everything seems to work fine.