cryosparcm eventlog P26 J40 | tail -n 40
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Particles will be zeropadded/truncated to size 650 during alignment
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Volume refinement will be done with effective box size 650
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Volume refinement will be done with pixel size 0.8200
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Particles will be zeropadded/truncated to size 650 during backprojection
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Particles will be backprojected with box size 650
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Volume will be internally cropped and stored with box size 650
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Volume will be interpolated with box size 650 (zeropadding factor 1.00)
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] DC components of images will be ignored and volume will be floated at each iteration.
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Spherical windowing of maps is enabled
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Refining with C1 symmetry enforced
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Resetting input per-particle scale factors to 1.0
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] Starting at initial resolution 30.000A (radwn 17.767).
[Wed, 11 Dec 2024 16:11:18 GMT] [CPU RAM used: 653 MB] ====== Masking ======
[Wed, 11 Dec 2024 16:11:29 GMT] [CPU RAM used: 5106 MB] No mask input was connected, so dynamic masking will be enabled.
[Wed, 11 Dec 2024 16:11:29 GMT] [CPU RAM used: 5106 MB] Dynamic mask threshold: 0.2000
[Wed, 11 Dec 2024 16:11:29 GMT] [CPU RAM used: 5106 MB] Dynamic mask near (A): 6.00
[Wed, 11 Dec 2024 16:11:29 GMT] [CPU RAM used: 5106 MB] Dynamic mask far (A): 14.00
[Wed, 11 Dec 2024 16:11:29 GMT] [CPU RAM used: 5106 MB] ====== Initial Model ======
[Wed, 11 Dec 2024 16:11:29 GMT] [CPU RAM used: 5106 MB] Resampling initial model to specified volume representation size and pixel-size…
[Wed, 11 Dec 2024 16:11:40 GMT] [CPU RAM used: 8189 MB] Estimating scale of initial reference.
[Wed, 11 Dec 2024 16:11:51 GMT] [CPU RAM used: 8391 MB] Rescaling initial reference by a factor of 1.049
[Wed, 11 Dec 2024 16:11:58 GMT] [CPU RAM used: 8420 MB] Estimating scale of initial reference.
[Wed, 11 Dec 2024 16:12:06 GMT] [CPU RAM used: 8418 MB] Rescaling initial reference by a factor of 1.007
[Wed, 11 Dec 2024 16:12:14 GMT] [CPU RAM used: 8423 MB] Estimating scale of initial reference.
[Wed, 11 Dec 2024 16:12:23 GMT] [CPU RAM used: 8424 MB] Rescaling initial reference by a factor of 1.000
[Wed, 11 Dec 2024 16:12:31 GMT] Initial Real Space Slices
[Wed, 11 Dec 2024 16:12:33 GMT] Initial Fourier Space Slices
[Wed, 11 Dec 2024 16:12:33 GMT] [CPU RAM used: 8590 MB] ====== Starting Refinement Iterations ======
[Wed, 11 Dec 2024 16:12:33 GMT] [CPU RAM used: 8590 MB] ----------------------------- Start Iteration 0
[Wed, 11 Dec 2024 16:12:33 GMT] [CPU RAM used: 8590 MB] Using Max Alignment Radius 17.767 (30.000A)
[Wed, 11 Dec 2024 16:12:33 GMT] [CPU RAM used: 8590 MB] Auto batchsize: 12100 in each split
[Wed, 11 Dec 2024 16:12:50 GMT] [CPU RAM used: 12919 MB] – THR 1 BATCH 500 NUM 6000 TOTAL 5.8994584 ELAPSED 117.29434 –
[Wed, 11 Dec 2024 16:14:51 GMT] [CPU RAM used: 16167 MB] Processed 24200.000 images in 121.875s.
[Wed, 11 Dec 2024 16:15:07 GMT] [CPU RAM used: 18394 MB] Computing FSCs…
[Wed, 11 Dec 2024 16:15:07 GMT] [CPU RAM used: 18394 MB] Using full box size 650, downsampled box size 336, with low memory mode disabled.
[Wed, 11 Dec 2024 16:15:07 GMT] [CPU RAM used: 18394 MB] Computing FFTs on GPU.
[Wed, 11 Dec 2024 16:15:15 GMT] [CPU RAM used: 20493 MB] Done in 7.502s
[Wed, 11 Dec 2024 16:15:15 GMT] [CPU RAM used: 20493 MB] Computing cFSCs…
[Wed, 11 Dec 2024 16:25:08 GMT] **** Kill signal sent by CryoSPARC (ID: ) ****
[Wed, 11 Dec 2024 16:25:08 GMT] Job is unresponsive - no heartbeat received in 600 seconds.
cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: ‘/csparc’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 15687286784, ‘name’: ‘Tesla T4’}], ‘hostname’: ‘biomix43’, ‘lane’: ‘biomix43’, ‘monitor_port’: None, ‘name’: ‘biomix43’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘csparc@biomix43’, ‘title’: ‘Worker node biomix43’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/usr/localMAIN/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: ‘/csparc’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 47810936832, ‘name’: ‘NVIDIA L40S’}], ‘hostname’: ‘biomix10’, ‘lane’: ‘biomix10’, ‘monitor_port’: None, ‘name’: ‘biomix10’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], ‘GPU’: [0], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘csparc@biomix10’, ‘title’: ‘Worker node biomix10’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/usr/localMAIN/cryosparc/cryosparc_worker/bin/cryosparcw’}, {‘cache_path’: ‘/csparc’, ‘cache_quota_mb’: 9000000, ‘cache_reserve_mb’: 10000, ‘custom_var_names’: [‘ram_gb_multiplier’], ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘biomix’, ‘lane’: ‘biomix’, ‘name’: ‘biomix’, ‘qdel_cmd_tpl’: ‘scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: ‘sinfo’, ‘qstat_cmd_tpl’: ‘squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘sbatch {{ script_path_abs }}’, ‘script_tpl’: ‘#!/usr/bin/env bash\n#### cryoSPARC cluster submission script template for SLURM\n## Available variables:\n## {{ run_cmd }} - the complete command string to run the job\n## {{ num_cpu }} - the number of CPUs needed\n## {{ num_gpu }} - the number of GPUs needed.\n## Note: The code will use this many GPUs starting from dev id 0.\n## The cluster scheduler has the responsibility\n## of setting CUDA_VISIBLE_DEVICES or otherwise enuring that the\n## job uses the correct cluster-allocated GPUs.\n## {{ ram_gb }} - the amount of RAM needed in GB\n## {{ job_dir_abs }} - absolute path to the job directory\n## {{ project_dir_abs }} - absolute path to the project dir\n## {{ job_log_path_abs }} - absolute path to the log file for the job\n## {{ worker_bin_path }} - absolute path to the cryosparc worker command\n## {{ run_args }} - arguments to be passed to cryosparcw run\n## {{ project_uid }} - uid of the project\n## {{ job_uid }} - uid of the job\n## {{ job_creator }} - name of the user that created the job (may contain spaces)\n## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)\n##\n## What follows is a simple SLURM script:\n\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n {{ num_cpu }}\n#SBATCH --gres=gpu:{{ num_gpu }}\n#SBATCH --partition=cryosparc\n#SBATCH --mem={{ (ram_gb|float * (ram_gb_multiplier|default(1))|float)|int}}G\n#SBATCH --output={{ job_dir_abs }}/slurm.out\n#SBATCH --error={{ job_dir_abs }}/slurm.err\n\n{{ run_cmd }}\n\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘biomix’, ‘tpl_vars’: [‘worker_bin_path’, ‘num_gpu’, ‘cluster_job_id’, ‘job_creator’, ‘ram_gb’, ‘num_cpu’, ‘command’, ‘run_cmd’, ‘job_uid’, ‘cryosparc_username’, ‘ram_gb_multiplier’, ‘job_dir_abs’, ‘run_args’, ‘job_log_path_abs’, ‘project_uid’, ‘project_dir_abs’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/usr/localMAIN/cryosparc/cryosparc_worker/bin/cryosparcw’}]
Since we have run the job on both workers i’ve included the output of both.
The 10TB is stored on the cluster head node, since that is where job submission happens. From there we export that cache space to the two worker nodes using NFS. There is a direct 10Gb connection between the head and worker nodes. The path to the cache is the same as well. This has worked previously, although it is not explicitly local to each worker node.