We are setting up a dedicated cloud instance to use as a CryoSPARC worker. Unfortunately, CryoSPARC keeps timing out when it tries to launch a job. The ssh config is set correctly with pubkey auth, and there is no issues as far as the worker install.
The cryosparc master just refuses to connect with no logs. Is there any way to troubleshoot this? I can ssh into the cloud server from the master server, using the exact same public key as to the other workers.
Did you confirm that the cloud-based worker receives a response from the master when you run the following commands on the worker:
curl <master_hostname>:<port>
where <master_hostname> corresponds $CRYOSPARC_MASTER_HOSTNAME, <port> to $((CRYOSPARC_BASE_PORT+1)), $((CRYOSPARC_BASE_PORT+2)), $((CRYOSPARC_BASE_PORT+6))?
It’s a government cloud instance, so we’re limited by the user it allows us to login as (in this case, exouser). However, the cryosparc user can ssh in with its public key, just like the other workers. The ssh config file is set up to reflect the need to ssh in as “exouser” instead of “cryosparc”.
For the same reason as above, no, the uids do not match up.
Yes, exouser can write to the job directories
When connecting the worker, cryosparcw says it will connect as exouser@, which is what we want, right?
Interesting. How did you ensure that, under the constraints of mismatching user ids, automatically generated job directories are writable to the other user?
I am not sure. Let’s assume for a moment that your master can connect to the cloud instance using the exoworker123.cloud hostname.
Under this assumption,
does the output of
cryosparcm cli "get_scheduler_targets()"
include an element with "hostname": "exoworker123.cloud"
and "ssh_str": "exouser@exoworker123.cloud"
can user cryosparc connect from the master to the cloud worker with the command
I must have misunderstood your description of the infrastructure. Are not job directories named J<number> created by Linux user cryosparc on the on-premise master host, and shared with the cloud worker host?
CryoSPARC requires shared access to the project directories (details).
CryoSPARC job directories are created on the master node, and worker nodes must be able to write to the job directories when they are running the jobs.
In addition, particle stacks may optionally be cached to fast scratch storage. Caching does not eliminate the need for worker access to the job directory.
Do the paths to each given project directory match between on-premise master and cloud worker?
How did you ensure that exouser on the cloud worker can write to each relevant job directory that was created by cryosparc? Have you tested file creation by exouser inside such a job directory?
Have you inspected the command_core log for relevant messages? Relevant information could be contained even in INFO, not just WARNING or ERROR messages.