New IP address causing job failure

rkhayat · July 23, 2024, 3:19pm

Hi,
Unfortunately, our workstation was rebooted and IP was changed. I can not convince our IT to assign a static IP to the computer. The change in IP results in failed jobs (connect to host 134.74.27.139). This is the old IP address. Any advice would be appreciated.

wtempel · July 23, 2024, 4:18pm

There may be a workaround if

your workstation acts as a combined CryoSPARC master/worker
and you will not connect additional workers to this CryoSPARC instance

Do these conditions apply in your case?

rkhayat · July 23, 2024, 4:19pm

Yes. These conditions do apply.

wtempel · July 23, 2024, 4:32pm

Please can you post the output of these commands:

cryosparcm status | grep -e HOSTNAME -e PORT
cryosparcm cli "get_scheduler_targets()"
grep ^127 /etc/hosts

edited: capitalization

rkhayat · July 23, 2024, 4:35pm

cryosparcm status | grep -e HOSTNAME -e PORT
export CRYOSPARC_MASTER_HOSTNAME=“Rostam”
export CRYOSPARC_BASE_PORT=39000

cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: ‘/mnt/RAID00/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 8512602112, ‘name’: ‘GeForce GTX 1070’}, {‘id’: 1, ‘mem’: 8513978368, ‘name’: ‘GeForce GTX 1070’}, {‘id’: 2, ‘mem’: 8513978368, ‘name’: ‘GeForce GTX 1070’}, {‘id’: 3, ‘mem’: 8513978368, ‘name’: ‘GeForce GTX 1070’}], ‘hostname’: ‘134.74.27.139’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘134.74.27.139’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], ‘GPU’: [0, 1, 2, 3], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, ‘ssh_str’: ‘cryosparc_user@134.74.27.139’, ‘title’: ‘Worker node 134.74.27.139’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/bin/cryosparcw’}, {‘cache_path’: ‘/scr/cryosparc_cache’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 11721113600, ‘name’: ‘GeForce GTX 1080 Ti’}, {‘id’: 1, ‘mem’: 11721506816, ‘name’: ‘GeForce GTX 1080 Ti’}, {‘id’: 2, ‘mem’: 11721506816, ‘name’: ‘GeForce GTX 1080 Ti’}, {‘id’: 3, ‘mem’: 11721506816, ‘name’: ‘GeForce GTX 1080 Ti’}], ‘hostname’: ‘134.74.27.116’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘134.74.27.116’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], ‘GPU’: [0, 1, 2, 3], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7]}, ‘ssh_str’: ‘cryosparc_user@134.74.27.116’, ‘title’: ‘Worker node 134.74.27.116’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparc_user/Applications/cryoSPARC_2.14.2b/cryosparc2_worker/bin/cryosparcw’}]

grep ^127 /etc/hosts
127.0.0.1 localhost
127.0.1.1 Rostam

I can get rid of the worker node, if possible.

wtempel · July 23, 2024, 5:58pm

@rkhayat Interesting. The target list includes two workers, 134.74.27.139 (4 * gtx1070) and 134.74.27.116 (4 * gtx1080ti).

Which of the two workers is/are still in use?
Which, if any, of the workers corresponds to the master host “Rostam”?
Is “Rostam” a hostname you have chosen, and unlikely to be assigned to any other computer on the network?
If IT support will not guarantee a persistent IP address, would they be able to guarantee a persistent “FQDN” (long-form hostname)? They could do this by changing the computers’ DNS entries in accordance with any changes in the IP address assignment. It would be sufficient if they could to guarantee the persistent hostname going forward, following a one-time change now, if needed for some reason.

rkhayat · July 23, 2024, 6:04pm

Rostam is still in use (IP before reboot: 134.74.27.139; IP after reboot is: 134.74.27.159).
Rostam is the master host.
Rostam is the name I have chosen and unlikely to be assigned to any other computer on the network.
I do not know if they will guarantee a persistent “FQDN.” I will ask. It will likely take some time for them to respond. Is this a requisite for going forward?

Thanks much for the help.

wtempel · July 23, 2024, 7:12pm

Given the existing /etc/hosts entry
127.0.1.1 Rostam
you can try (on Rostam as Linux user cryosparc_user)

/home/cryosparc_user/Applications/cryoSPARC_2.14.2/cryosparc2_worker/bin/cryosparcw connect --master Rostam --worker Rostam --ssdpath /mnt/RAID00/cryosparc_cache --port 39000
cryosparcm cli "remove_scheduler_target_node('134.74.27.139')"
cryosparcm cli "remove_scheduler_target_node('134.74.27.116')"

(add a new target definition, remove two outdated target definitions)
Do these commands restore the capability to run CryoSPARC jobs?

rkhayat · July 23, 2024, 7:44pm

Nice, it’s working! Thank you so much.

If anyone has same issues and desides to follow the protocol, the last set of commands posted by wtempel should be run when cryoSPARC is active/running, not when it has been stopped.