I have an issue with our cluster configuration. We have updated to version 4.4.1 a while ago. Everything was running smoothly. We got some new GPU nodes which I wanted to add. Therefore, I adapted my template cluster_info.json and cluster_script.sh and executed cryosparcm cluster connect as always resulting in the following error (full error via PM):
TypeError: add_scheduler_target_cluster() got an unexpected keyword argument 'transfer_cmd_tpl'
cryosparcm cluster connect run successfully.
The jobs, however, are not getting submitted to the cluster anymore using the new lane.
Can somebody explain the behavior or is there a workaround for the new version?
Thanks a lot!
Here are the template cluster_files:
cluster_info.json
thank you for your fast reply and great questions with which I was able to figure it out myself.
The issues were the variables which was not that obvious but indicated by the command_core log.
After updating, the defaults for the variable were deleted since I did not hardcode them.
So I changed
I think they are not needed anymore, I will remove them now.
Thank you for the hint. So far, there were no problems. However, slurm properly assigns GPU ressources which probably makes the code block unneccessary in the future. I removed it now.
Thanks a lot!
Edit: I was noticing that it is not straight forward to setup up a multi-user cluster integration for cryosparc. We figured out a system where each user is able to submit cryosparc jobs with their own slurm user using a single master instance for everybody. In case it is interesting for others I could write a short documentation on how we set everything up. Just let me know.