cryoSPARC worker node is not responsive on AWS

Hi community,

I have been trying to deploy cryoSPARC on AWS according to the instruction. The installation and deployment went well and seemed successful. However, when I tried to do a test with the tutorial data, the 3 different worker node doesn’t seem to work. The Motion correction job showed launched, but no further response (waited for 15 mins).

I checked EC2 dashboard, and I can only see one Master instance. Is this normal? Any insights will be greatly appreciated!!

Just to quickly add to my own post, do I need to launch new EC2 instances for the worker nodes? I think everything is automatic, but like to clarify…Thanks!

Welcome to the forum @kennywang.
Please can you provide some additional information?
Was the movie import performed on that same AWS-hosted cryoSPARC instance (P1, J1, presumably)?
Did you use the scripts/templates hosted at https://github.com/cryoem-uoft/aws-deployment-guide/archive/refs/tags/v1.0.zip?
and confirm that all referenced instance types are available in your chosen availability zone?
Did you modify any of the templates?
Does the file /<path_to_project_dirs>/P1/J2/job.log exist and contain any information?

Thank you for your reply, wtempel! Here are my answers,

  1. Yes. movie import is P1, J1.
  2. Yes, I used the scripts and followed the instructions here.
  3. Yes. I chose us-east-1 as region, and us-east-1a as AZ.
  4. No. I did not modefy the template. I installed pcluster 3.x in the first place; that gave me a lot of errors when I employed. But once I switch to pcluster 2.x, it was deployed succesfully
  5. No. the log file doesn’t exist. I tried it again (now J3), similar issue happened. Below is the meta data if this could be useful…
{"_id":{"_str":"xxxxxx"},"children":[],"cloned_from":null,"created_at":"2022-08-27T15:33:28.931Z","deleted":false,"parents":["J1"],"project_uid":"P1","queue_message":null,"status":"launched","title":"New Job J3","type":"patch_motion_correction_multi","ui_tile_height":1,"ui_tile_images":[],"ui_tile_width":2,"uid":"J3","workspace_uids":["W1"],"version":"v3.3.2","job_type":"patch_motion_correction_multi","run_as_user":null,"description":"Enter a description.","params_secs":{"general_settings":{"title":"General settings","desc":"","order":0,"name":"general_settings"},"motion_settings":{"title":"Motion correction","desc":"","order":1,"name":"motion_settings"},"compute_settings":{"title":"Compute settings","desc":"","order":2,"name":"compute_settings"}},"params_base":{"do_plots":{"type":"boolean","value":true,"title":"Make motion diagnostic plots","desc":"Whether or not to make plots of motion trajectories. Motion trajectories can also be inspected using the \"Curate Exposures\" job type after this job completes.","order":0,"section":"general_settings","advanced":false,"hidden":false,"name":"do_plots","is_default":true},"num_plots":{"type":"number","value":10,"title":"Number of movies to plot","desc":"Only make plots for the first this many movies.","order":1,"section":"general_settings","advanced":false,"hidden":false,"name":"num_plots","is_default":true},"random_num":{"type":"number","value":null,"title":"Only process this many movies","desc":"Randomly select this many movies to process. Helpful for tweaking params.","order":2,"section":"general_settings","advanced":false,"hidden":false,"name":"random_num","is_default":true},"memoryfix":{"type":"boolean","value":true,"title":"Reduce GPU memory usage","desc":"Whether or not to use the reduced-memory-footprint version of patch motion correction (BETA). This feature does not change the algorithm used, or the results.","order":3,"section":"general_settings","advanced":false,"hidden":true,"name":"memoryfix"},"memoryfix2":{"type":"boolean","value":false,"title":"Low-memory mode","desc":"If running out of GPU memory, this option can be used to prioritize memory use at the expense of speed (BETA). The results are unchanged.","order":4,"section":"general_settings","advanced":false,"hidden":false,"name":"memoryfix2","is_default":true},"res_max_align":{"type":"number","value":5,"title":"Maximum alignment resolution (A)","desc":"Maximum resolution (in A) to consider when aligning frames. Generally, betwen 5A and 3A is best.","order":5,"section":"motion_settings","advanced":false,"hidden":false,"name":"res_max_align","is_default":true},"bfactor":{"type":"number","value":500,"title":"B-factor during alignment","desc":"B-factor that blurs frames before aligning. Generally 500 to 100 is best.","order":6,"section":"motion_settings","advanced":false,"hidden":false,"name":"bfactor","is_default":true},"frame_start":{"type":"number","value":0,"title":"Start frame (included, 0-based)","desc":"Which frame number, starting at zero, to begin motion correction from. This value controls how many early frames are dropped from the motion corrected result. This value will also be used in local motion correction.","order":7,"section":"motion_settings","advanced":false,"hidden":false,"name":"frame_start","is_default":true},"frame_end":{"type":"number","value":null,"title":"End frame (excluded, 0-based) ","desc":"Which frame number, starting at zero, to not include in motion correction, also excluding all frames after this one. Generally this does not improve results, as later frames are downweighted during dose weighting in local motion correction.","order":8,"section":"motion_settings","advanced":false,"hidden":false,"name":"frame_end","is_default":true},"output_fcrop_factor":{"type":"enum","value":"1","title":"Output F-crop factor","desc":"Output Fourier cropping factor. 1.0 means no cropping, 1/2 means crop to 1/2 the resolution, etc.","order":9,"section":"motion_settings","advanced":false,"hidden":false,"enum_keys":["1","3/4","1/2","1/4"],"enum_dict":{"1":1,"3/4":0.75,"1/2":0.5,"1/4":0.25},"name":"output_fcrop_factor","is_default":true},"override_total_exp":{"type":"number","value":null,"title":"Override e/A^2","desc":"Override the dose (in total e/A^2 over the exposure) that was given at import time but can be overridden here.","order":10,"section":"motion_settings","advanced":false,"hidden":false,"name":"override_total_exp","is_default":true},"variable_dose":{"type":"boolean","value":false,"title":"Allow Variable Dose","desc":"Enable correct processing when frames have variable dose fractionation","order":11,"section":"motion_settings","advanced":true,"hidden":false,"name":"variable_dose","is_default":true},"smooth_lambda_cal":{"type":"number","value":0.5,"title":"Calibrated smoothing","desc":"Calibrated smoothing constant applied to trajectories","order":12,"section":"motion_settings","advanced":true,"hidden":false,"name":"smooth_lambda_cal","is_default":true},"override_K_Z":{"type":"number","value":null,"title":"Override knots Z","desc":"Override automatically selected spline order for Z dimension (time)","order":13,"section":"motion_settings","advanced":true,"hidden":false,"name":"override_K_Z","is_default":true},"override_K_Y":{"type":"number","value":null,"title":"Override knots Y","desc":"Override automatically selected spline order for Y dimension (vertical)","order":14,"section":"motion_settings","advanced":true,"hidden":false,"name":"override_K_Y","is_default":true},"override_K_X":{"type":"number","value":null,"title":"Override knots X","desc":"Override automatically selected spline order for X dimension (horizontal)","order":15,"section":"motion_settings","advanced":true,"hidden":false,"name":"override_K_X","is_default":true},"compute_num_gpus":{"type":"number","value":2,"title":"Number of GPUs to parallelize","desc":"Number of GPUs over which to parallelize computation.","order":16,"section":"compute_settings","advanced":false,"hidden":false,"name":"compute_num_gpus","is_default":false}},"params_spec":{"compute_num_gpus":{"value":2}},"input_slot_groups":[{"type":"exposure","name":"movies","title":"Movies","description":"Movies for motion correction","count_min":1,"count_max":null,"repeat_allowed":false,"slots":[{"type":"exposure.movie_blob","name":"movie_blob","title":"Raw movie data","description":"","optional":false},{"type":"exposure.gain_ref_blob","name":"gain_ref_blob","title":"Raw movie data","description":"","optional":true},{"type":"exposure.mscope_params","name":"mscope_params","title":"Exposure parameters","description":"","optional":false}],"connections":[{"job_uid":"J1","group_name":"imported_movies","slots":[{"slot_name":"movie_blob","job_uid":"J1","group_name":"imported_movies","result_name":"movie_blob","result_type":"exposure.movie_blob","version":"F"},{"slot_name":"gain_ref_blob","job_uid":"J1","group_name":"imported_movies","result_name":"gain_ref_blob","result_type":"exposure.gain_ref_blob","version":"F"},{"slot_name":"mscope_params","job_uid":"J1","group_name":"imported_movies","result_name":"mscope_params","result_type":"exposure.mscope_params","version":"F"}]}]}],"output_result_groups":[{"uid":"J3-G0","type":"exposure","name":"micrographs","title":"Micrographs full-frame aligned","description":"","contains":[{"uid":"J3-R0","type":"exposure.micrograph_blob","group_name":"micrographs","name":"micrograph_blob_non_dw","passthrough":false},{"uid":"J3-R1","type":"exposure.thumbnail_blob","group_name":"micrographs","name":"micrograph_thumbnail_blob_1x","passthrough":false},{"uid":"J3-R2","type":"exposure.thumbnail_blob","group_name":"micrographs","name":"micrograph_thumbnail_blob_2x","passthrough":false},{"uid":"J3-R3","type":"exposure.micrograph_blob","group_name":"micrographs","name":"micrograph_blob","passthrough":false},{"uid":"J3-R4","type":"exposure.stat_blob","group_name":"micrographs","name":"background_blob","passthrough":false},{"uid":"J3-R5","type":"exposure.motion","group_name":"micrographs","name":"rigid_motion","passthrough":false},{"uid":"J3-R6","type":"exposure.motion","group_name":"micrographs","name":"spline_motion","passthrough":false},{"uid":"J3-R7","type":"exposure.movie_blob","group_name":"micrographs","name":"movie_blob","passthrough":true},{"uid":"J3-R8","type":"exposure.gain_ref_blob","group_name":"micrographs","name":"gain_ref_blob","passthrough":true},{"uid":"J3-R9","type":"exposure.mscope_params","group_name":"micrographs","name":"mscope_params","passthrough":true}],"passthrough":"movies","num_items":0,"summary":{}},{"uid":"J3-G1","type":"exposure","name":"micrographs_incomplete","title":"Incomplete Micrographs full-frame aligned","description":"","contains":[{"uid":"J3-R10","type":"exposure.micrograph_blob","group_name":"micrographs_incomplete","name":"micrograph_blob","passthrough":false},{"uid":"J3-R11","type":"exposure.movie_blob","group_name":"micrographs_incomplete","name":"movie_blob","passthrough":true},{"uid":"J3-R12","type":"exposure.gain_ref_blob","group_name":"micrographs_incomplete","name":"gain_ref_blob","passthrough":true},{"uid":"J3-R13","type":"exposure.mscope_params","group_name":"micrographs_incomplete","name":"mscope_params","passthrough":true}],"passthrough":"movies","num_items":0,"summary":{}}],"output_results":[{"uid":"J3-R0","type":"exposure.micrograph_blob","group_name":"micrographs","name":"micrograph_blob_non_dw","title":"Motion-corrected micrographs","description":"","min_fields":[["path","O"],["idx","u4"],["shape","2u4"],["psize_A","f4"],["format","O"],["is_background_subtracted","u4"],["vmin","f4"],["vmax","f4"],["import_sig","u8"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R1","type":"exposure.thumbnail_blob","group_name":"micrographs","name":"micrograph_thumbnail_blob_1x","title":"Motion-corrected micrographs thumbnails","description":"@1x, 120x120","min_fields":[["path","O"],["idx","u4"],["shape","2u4"],["format","O"],["binfactor","u4"],["micrograph_path","O"],["vmin","f4"],["vmax","f4"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R2","type":"exposure.thumbnail_blob","group_name":"micrographs","name":"micrograph_thumbnail_blob_2x","title":"Motion-corrected micrographs thumbnails","description":"@2x, 240x240","min_fields":[["path","O"],["idx","u4"],["shape","2u4"],["format","O"],["binfactor","u4"],["micrograph_path","O"],["vmin","f4"],["vmax","f4"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R3","type":"exposure.micrograph_blob","group_name":"micrographs","name":"micrograph_blob","title":"Motion-corrected dose-weighted micrographs","description":"","min_fields":[["path","O"],["idx","u4"],["shape","2u4"],["psize_A","f4"],["format","O"],["is_background_subtracted","u4"],["vmin","f4"],["vmax","f4"],["import_sig","u8"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R4","type":"exposure.stat_blob","group_name":"micrographs","name":"background_blob","title":"Background estimates","description":"","min_fields":[["path","O"],["idx","u4"],["binfactor","u4"],["shape","2u4"],["psize_A","f4"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R5","type":"exposure.motion","group_name":"micrographs","name":"rigid_motion","title":"Full-frame motion estimates","description":"","min_fields":[["type","O"],["path","O"],["idx","u4"],["frame_start","u4"],["frame_end","u4"],["zero_shift_frame","u4"],["psize_A","f4"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R6","type":"exposure.motion","group_name":"micrographs","name":"spline_motion","title":"Patch-based motion estimates","description":"","min_fields":[["type","O"],["path","O"],["idx","u4"],["frame_start","u4"],["frame_end","u4"],["zero_shift_frame","u4"],["psize_A","f4"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R7","type":"exposure.movie_blob","group_name":"micrographs","name":"movie_blob","title":"Passthrough movie_blob","description":"Passthrough from input movies.movie_blob (slot_name)","min_fields":[["path","O"],["shape","3u4"],["psize_A","f4"],["is_gain_corrected","u4"],["format","O"],["has_defect_file","u4"],["import_sig","u8"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":true},{"uid":"J3-R8","type":"exposure.gain_ref_blob","group_name":"micrographs","name":"gain_ref_blob","title":"Passthrough gain_ref_blob","description":"Passthrough from input movies.gain_ref_blob (slot_name)","min_fields":[["path","O"],["idx","u4"],["shape","2u4"],["flip_x","u4"],["flip_y","u4"],["rotate_num","u4"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":true},{"uid":"J3-R9","type":"exposure.mscope_params","group_name":"micrographs","name":"mscope_params","title":"Passthrough mscope_params","description":"Passthrough from input movies.mscope_params (slot_name)","min_fields":[["accel_kv","f4"],["cs_mm","f4"],["total_dose_e_per_A2","f4"],["phase_plate","u4"],["neg_stain","u4"],["exp_group_id","u4"],["defect_path","O"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":true},{"uid":"J3-R10","type":"exposure.micrograph_blob","group_name":"micrographs_incomplete","name":"micrograph_blob","title":"Motion-corrected dose-weighted micrographs","description":"","min_fields":[["path","O"],["idx","u4"],["shape","2u4"],["psize_A","f4"],["format","O"],["is_background_subtracted","u4"],["vmin","f4"],["vmax","f4"],["import_sig","u8"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":false},{"uid":"J3-R11","type":"exposure.movie_blob","group_name":"micrographs_incomplete","name":"movie_blob","title":"Passthrough movie_blob","description":"Passthrough from input movies.movie_blob (slot_name)","min_fields":[["path","O"],["shape","3u4"],["psize_A","f4"],["is_gain_corrected","u4"],["format","O"],["has_defect_file","u4"],["import_sig","u8"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":true},{"uid":"J3-R12","type":"exposure.gain_ref_blob","group_name":"micrographs_incomplete","name":"gain_ref_blob","title":"Passthrough gain_ref_blob","description":"Passthrough from input movies.gain_ref_blob (slot_name)","min_fields":[["path","O"],["idx","u4"],["shape","2u4"],["flip_x","u4"],["flip_y","u4"],["rotate_num","u4"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":true},{"uid":"J3-R13","type":"exposure.mscope_params","group_name":"micrographs_incomplete","name":"mscope_params","title":"Passthrough mscope_params","description":"Passthrough from input movies.mscope_params (slot_name)","min_fields":[["accel_kv","f4"],["cs_mm","f4"],["total_dose_e_per_A2","f4"],["phase_plate","u4"],["neg_stain","u4"],["exp_group_id","u4"],["defect_path","O"]],"versions":[],"metafiles":[],"num_items":[],"passthrough":true}],"output_group_images":{},"errors_build_params":{},"errors_build_inputs":{},"errors_run":[],"queued_at":"2022-08-27T15:33:57.993Z","launched_at":"2022-08-27T15:33:58.257Z","started_at":null,"running_at":null,"waiting_at":null,"completed_at":null,"killed_at":null,"failed_at":null,"token_acquired_at":null,"tokens_requested_at":null,"last_scheduled_at":null,"last_accessed":{"name":"kwang-aws","accessed_at":"2022-08-27T15:33:58.366Z"},"priority":0,"resources_needed":{"slots":{"CPU":12,"GPU":2,"RAM":4},"fixed":{"SSD":false}},"resources_allocated":{"lane":"gpu-med","lane_type":"gpu-med","hostname":"gpu-med","target":{"type":"cluster","lane":"gpu-med","name":"gpu-med","title":"cryosparc-cluster","desc":null,"hostname":"gpu-med","worker_bin_path":"/shared/cryosparc/cryosparc_worker/bin/cryosparcw","script_tpl":"#!/bin/bash\n#SBATCH --job-name=cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --output={{ job_log_path_abs }}\n#SBATCH --error={{ job_log_path_abs }}\n#SBATCH --nodes=1\n#SBATCH --ntasks-per-node=1\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --partition=gpu-med\n{{ run_cmd }}\n","send_cmd_tpl":"{{ command }}","qsub_cmd_tpl":"sbatch {{ script_path_abs }}","qstat_cmd_tpl":"squeue -j {{ cluster_job_id }}","qdel_cmd_tpl":"scancel {{ cluster_job_id }}","qinfo_cmd_tpl":"sinfo","cache_path":"","cache_reserve_mb":10000,"cache_quota_mb":null},"slots":{"CPU":[0,1,2,3,4,5,6,7,8,9,10,11],"GPU":[0,1],"RAM":[0,1,2,3]},"fixed":{"SSD":false},"license":true,"licenses_acquired":2},"run_on_master_direct":false,"queued_to_lane":"gpu-med","queue_index":null,"queue_status":null,"queued_job_hash":null,"interactive":false,"interactive_hostname":"gpu-med","interactive_port":null,"PID_monitor":null,"PID_main":null,"PID_workers":[],"cluster_job_id":"6","created_by_user_id":"63063bb098f9e6ea79f5950f","created_by_job_uid":null,"is_experiment":false,"job_dir":"J3","job_dir_size":0,"experiment_worker_path":null,"enable_bench":false,"bench":{},"completed_count":0,"instance_information":{},"project_uid_num":1,"uid_num":3,"ui_layouts":{"P1":{"show":true,"floater":false,"top":232,"left":173,"width":298,"height":192,"groups":[]},"P1W1":{"show":true,"floater":false,"top":232,"left":173,"width":298,"height":192,"groups":[]}},"last_exported":"2022-08-27T15:33:48.986Z","queued_to_hostname":false,"queued_to_gpu":false,"no_check_inputs_ready":false,"num_tokens":2,"job_sig":"xxx"}

How does the successfully deployed EC2 dashboard look like? Mine has only one master node, and I feel that is the problem…

Thank you again and please let me know if you have any suggestions…

Please can you list the files inside the J2 directory?

Hi wtempel,

I just found that the medium worker node is responsive now. It takes about 2-3 mins for the queue to respond. But High and low worker noedes are still not responsive. It could be a bug from the AWS, but thank you so much for the following-up!

Best,

Kenny