Patch motion correction fails 2.14.2

Eugene · March 5, 2020, 11:04pm

Hi,

We have just upgraded to the latest cryoSPARC version (2.14.2) and ran a new job (patch motion correction) on previously imported data. On the previous version this all worked fine. We get the following error via the webpage:

Traceback (most recent call last): File "cryosparc2_worker/cryosparc2_compute/run.py", line 82, in cryosparc2_compute.run.main 
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 422, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi 
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 325, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.make_outputs 
File "cryosparc2_compute/jobs/runcommon.py", line 580, in output result = com.query(job['output_results'], lambda r: r['group_name'] == group_name and r['name'] == name, error="No output result named %s.%s in job." % (group_name, name)) 
File "cryosparc2_compute/jobs/common.py", line 357, in query assert res != default, error 
AssertionError: No output result named micrographs.micrograph_thumbnail_blob_1x in job.

I tried re-importing the data and then running again but got the same error. Also tried 1 gpu vs 4 gpu but that made no difference.

Here is the log file for the job:

================= CRYOSPARCW =======  2020-03-04 11:25:34.973147  =========
Project P3 Job J26
Master XXX
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 223106
========= monitor process now waiting for main process
MAIN PID 223106
motioncorrection.run_patch cryosparc2_compute.jobs.jobregister
***************************************************************
Running job on hostname %s slurmcluster
Allocated Resources :  {u'lane': u'slurmcluster', u'target': {u'lane': u'slurmcluster', u'qdel_cmd_tpl': u'scancel {{ cluster_job_id }}', u'name': u'slurmcluster', u'title': u'slurmcluster', u'hostname': u'slurmcluster', u'qstat_cmd_tpl': u'squeue -j {{ cluster_job_id }}', u'worker_bin_path': u'/cm/shared/apps/cryosparc/cryosparc2_worker/bin/cryosparcw', u'qinfo_cmd_tpl': u'sinfo', u'qsub_cmd_tpl': u'sbatch {{ script_path_abs }}', u'cache_path': u'/tmp', u'cache_quota_mb': None, u'script_tpl': u'#!/usr/bin/env bash\n#### cryoSPARC cluster submission script template for SLURM\n## Available variables:\n## {{ run_cmd }}            - the complete command string to run the job\n## {{ num_cpu }}            - the number of CPUs needed\n## {{ num_gpu }}            - the number of GPUs needed. \n##                            Note: the code will use this many GPUs starting from dev id 0\n##                                  the cluster scheduler or this script have the responsibility\n##                                  of setting CUDA_VISIBLE_DEVICES so that the job code ends up\n##                                  using the correct cluster-allocated GPUs.\n## {{ ram_gb }}             - the amount of RAM needed in GB\n## {{ job_dir_abs }}        - absolute path to the job directory\n## {{ project_dir_abs }}    - absolute path to the project dir\n## {{ job_log_path_abs }}   - absolute path to the log file for the job\n## {{ worker_bin_path }}    - absolute path to the cryosparc worker command\n## {{ run_args }}           - arguments to be passed to cryosparcw run\n## {{ project_uid }}        - uid of the project\n## {{ job_uid }}            - uid of the job\n## {{ job_creator }}        - name of the user that created the job (may contain spaces)\n## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)\n##\n## What follows is a simple SLURM script:\n\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH -p gpu\n##SBATCH --mem={{ (ram_gb*1000)|int }}MB\n#SBATCH --mem=128000MB             \n#SBATCH -o {{ job_log_path_abs }}\n#SBATCH -e {{ job_log_path_abs }}\n\n\navailable_devs=""\nfor devidx in $(seq 0 15);\ndo\n    if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then\n        if [[ -z "$available_devs" ]] ; then\n            available_devs=$devidx\n        else\n            available_devs=$available_devs,$devidx\n        fi\n    fi\ndone\nexport CUDA_VISIBLE_DEVICES=$available_devs\n\n{{ run_cmd }}\n\n\n', u'cache_reserve_mb': 10000, u'type': u'cluster', u'send_cmd_tpl': u'ssh plpkvada001 {{ command }}', u'desc': None}, u'license': True, u'hostname': u'slurmcluster', u'slots': {u'GPU': [0], u'RAM': [0, 1], u'CPU': [0, 1, 2, 3, 4, 5]}, u'fixed': {u'SSD': False}, u'lane_type': u'slurmcluster', u'licenses_acquired': 1}
/cm/shared/apps/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py:284: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
/cm/shared/apps/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py:516: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  max_open_warning, RuntimeWarning)
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 82, in cryosparc2_compute.run.main
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 422, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 325, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.make_outputs
  File "cryosparc2_compute/jobs/runcommon.py", line 580, in output
    result = com.query(job['output_results'], lambda r: r['group_name'] == group_name and r['name'] == name, error="No output result named %s.%s in job." % (group_name, name))
  File "cryosparc2_compute/jobs/common.py", line 357, in query
    assert res != default, error
AssertionError: No output result named micrographs.micrograph_thumbnail_blob_1x in job.
========= main process now complete.
========= monitor process now complete.

Please let me know if you require further information.

Cheers,
Eugene

stephan · March 5, 2020, 11:19pm

Hi @Eugene,

Is it possible if you can run cryosparcm cli "refresh_job_types()" then start a fresh job and report if this still happens?

Eugene · March 6, 2020, 12:24am

Hi,
Ran the command and then tried again and it failed as before. I then created a new project and workspace and re-imported the 385 movies. This worked but then Patch Motion failed as before. Appears to be failing at 23/24 movies consistently.

Cheers
Eugene

Eugene · March 10, 2020, 2:44am

Hi Stephan,

Followed your instructions below and job now runs successfully.

Usually, when you run cryosparcm cli “refresh_job_types()”, then create a new job and it doesn’t work, it might mean that there is a zombie process running in the background that is causing the inconsistency. To fix this:

Stop cryoSPARC: cryosparcm stop

Check if there are any orphaned cryoSPARC processes via ps:

 ps -ax | grep “supervisord” (kill only processes related to your cryosparc2 instance)
 ps -ax | grep “cryosparc2_command” (kill all the matching processes related to your cryosparc2 instance)
 ps -ax | grep “mongod” (kill only the process running your cryosparc2 database)
 e.g. kill 82681

Delete any .sock files that contain the words "cryosparc-supervisor" in the path, which will be found in the /tmp directory
Start cryoSPARC: cryosparcm start
Build a new Patch Motion job, and run it.