Greetings.
We have a system that is shared (SLURM) and installed cryosparc using the cluster_info.json and cluster_script.sh files so that it uses the SLURM queue. The SLURM queue is working fine for non-cryosparc jobs, so the is not a problem.
When we start a job in the cryosparc GUI, it quickly crashes.
-------- Submission command: sbatch /executor/cryoem/userlab/2022-08-18_UA-FapC-QF1_KrF4ecC250np96kxOA100-60eA2/P2/J15/queue_sub_script.sh
Failed to launch! 1
We tried with 0, 1 and with 2 GPUs. We did capture this error message before it disappeared…
ServerError: Traceback (most recent call last): File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 150, in wrapper res = func(*args, **kwargs) File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 2309, in run_job res = subprocess.check_output(cmd, stderr=subprocess.STDOUT, shell=True).decode() File “/executor/opt/cryoem/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/subprocess.py”, line 411, in check_output **kwargs).stdout File “/executor/opt/cryoem/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.7/subprocess.py”, line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command ‘sbatch /executor/cryoem/conwaylab/2022-08-18_UA-FapC-QF1_KrF4ecC250np96kxOA100-60eA2/P2/J16/queue_sub_script.sh’ returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 150, in wrapper res = func(*args, **kwargs) File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 1861, in scheduler_run scheduler_run_core(do_run) File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 2079, in scheduler_run_core run_job(job[‘project_uid’], job[‘uid’]) # takes care of the cluster case and the node case File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 157, in wrapper raise ServerError(s.getvalue(), code=400) from e flask_jsonrpc.exceptions.ServerError The above exception was the direct cause of the following exception: Traceback (most recent call last): File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 150, in wrapper res = func(*args, **kwargs) File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 5121, in enqueue_job scheduler_run() File “/executor/opt/cryoem/cryosparc/cryosparc_master/cryosparc_command/command_core/init.py”, line 157, in wrapper raise ServerError(s.getvalue(), code=400) from e flask_jsonrpc.exceptions.ServerError
Thanks for your assistance in this issue.