We are setting up a dedicated cloud instance to use as a CryoSPARC worker. Unfortunately, CryoSPARC keeps timing out when it tries to launch a job. The ssh config is set correctly with pubkey auth, and there is no issues as far as the worker install.
The cryosparc master just refuses to connect with no logs. Is there any way to troubleshoot this? I can ssh into the cloud server from the master server, using the exact same public key as to the other workers.
Did you confirm that the cloud-based worker receives a response from the master when you run the following commands on the worker:
curl <master_hostname>:<port>
where <master_hostname>
corresponds $CRYOSPARC_MASTER_HOSTNAME
, <port>
to $((CRYOSPARC_BASE_PORT+1))
, $((CRYOSPARC_BASE_PORT+2))
, $((CRYOSPARC_BASE_PORT+6))
?
Yes, I receive a response for each port.
-
It’s a government cloud instance, so we’re limited by the user it allows us to login as (in this case, exouser). However, the cryosparc user can ssh in with its public key, just like the other workers. The ssh config file is set up to reflect the need to ssh in as “exouser” instead of “cryosparc”.
-
For the same reason as above, no, the uids do not match up.
-
Yes, exouser can write to the job directories
When connecting the worker, cryosparcw says it will connect as exouser@, which is what we want, right?
Interesting. How did you ensure that, under the constraints of mismatching user ids, automatically generated job directories are writable to the other user?
I am not sure. Let’s assume for a moment that your master can connect to the cloud instance using the exoworker123.cloud
hostname.
Under this assumption,
- does the output of
cryosparcm cli "get_scheduler_targets()"
include an element with
"hostname": "exoworker123.cloud"
and
"ssh_str": "exouser@exoworker123.cloud"
- can user
cryosparc
connect from the master to the cloud worker with the commandssh exouser@exoworker123.cloud
?
There’s only a single user on the cloud instance, exouser. So the directories are being created by exouser each time.
- Yes, the hostname and sshstr both show up, and the cryosparc user can connect using
ssh exousert@<server.name>
I must have misunderstood your description of the infrastructure. Are not job directories named J<number>
created by Linux user cryosparc
on the on-premise master host, and shared with the cloud worker host?
It was my understanding that CryoSPARC recreated the directories on the worker in the cryosparc_scratch directory
CryoSPARC requires shared access to the project directories (details).
CryoSPARC job directories are created on the master node, and worker nodes must be able to write to the job directories when they are running the jobs.
In addition, particle stacks may optionally be cached to fast scratch storage. Caching does not eliminate the need for worker access to the job directory.
Even after sharing the project directories, CryoSPARC still is stuck on the “launched” screen. No logs either.