Your account expired error

Hello CS team,

I got an error message (see below) when launching a job in CS 4.6.2.

License is valid.

Launching job on lane default target localhost …

Running job on remote worker node hostname localhost

Failed to launch! 255

Your account has expired; please contact your system administrator
Connection closed by ::1 port 22

I have contacted our IT and was told this had nothing to do our IT but related to CS issues. I would greatly appreciate it you could help with the troubleshoot. Thank you!

Please can you post the outputs of these commands:

ls -l $(which cryosparcm)
cryosparcm cli "get_scheduler_targets()"
cryosparcm status | grep HOST
hostname -f
host $(hostname -f)
host localhost
cat /etc/hosts

Thanks for your response. The outputs are as follows:

cryosparc@RDLR0027 ~]$ ls -l $(which cryosparcm)
-rwxr-xr-x. 1 cryosparc cryosparc 76852 Nov 18 10:19 /app/apps/rhel8/cryosparc/cryosparc_master/bin/cryosparcm

[cryosparc@RDLR0027 ~]$ cryosparcm cli "get_scheduler_targets()"
[{'cache_path': '/data2/cryosparc_cache', 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 50943623168, 'name': 'Quadro RTX 8000'}, {'id': 1, 'mem': 50946506752, 'name': 'Quadro RTX 8000'}, {'id': 2, 'mem': 50946506752, 'name': 'Quadro RTX 8000'}], 'hostname': 'localhost', 'lane': 'default', 'monitor_port': None, 'name': 'localhost', 'resource_fixed': {'SSD': True}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95], 'GPU': [0, 1, 2, 3], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]}, 'ssh_str': 'cryosparc@localhost', 'title': 'Worker node localhost', 'type': 'node', 'worker_bin_path': '/app/apps/rhel8/cryosparc/cryosparc_worker/bin/cryosparcw'}, {'cache_path': None, 'cache_quota_mb': None, 'cache_reserve_mb': 10000, 'desc': None, 'gpus': [{'id': 0, 'mem': 50943623168, 'name': 'Quadro RTX 8000'}, {'id': 1, 'mem': 50946768896, 'name': 'Quadro RTX 8000'}, {'id': 2, 'mem': 50946768896, 'name': 'Quadro RTX 8000'}], 'hostname': 'RDLR0027.ddns.med.umich.edu', 'lane': 'default', 'monitor_port': None, 'name': 'RDLR0027.ddns.med.umich.edu', 'resource_fixed': {'SSD': False}, 'resource_slots': {'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95], 'GPU': [0, 1, 2], 'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]}, 'ssh_str': 'root@RDLR0027.ddns.med.umich.edu', 'title': 'Worker node RDLR0027.ddns.med.umich.edu', 'type': 'node', 'worker_bin_path': '/home/cryosparc/cryosparc_worker/bin/cryosparcw'}]

[cryosparc@RDLR0027 ~]$ cryosparcm status | grep HOST
export CRYOSPARC_MASTER_HOSTNAME="RDLR0027.ddns.med.umich.edu"
[cryosparc@RDLR0027 ~]$ hostname -f
RDLR0027.ddns.med.umich.edu
[cryosparc@RDLR0027 ~]$ host $(hostname -f)
RDLR0027.ddns.med.umich.edu has address 172.17.176.141
[cryosparc@RDLR0027 ~]$ host localhost
localhost.ddns.med.umich.edu has address 10.60.122.197
[cryosparc@RDLR0027 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
[cryosparc@RDLR0027 ~]$

HI Wtempel, I wonder if you have had a chance to look into the outputs I sent yesterday. We can’t run any jobs right now and would greatly appreciate your help to troubleshoot. Thanks.

@haomingz May I ask:

  1. Two workers, RDLR0027.ddns.med.umich.edu and localhost, are registered on this CryoSPARC instance. Do these refer to the same physical computer?
  2. What is the output of the command ip a ?
  3. Is your network configured such that
    • the server will retain the RDLR0027.ddns.med.umich.edu host name after each reboot
    • any attempt from this or another computer to connect to RDLR0027.ddns.med.umich.edu will point to this computer, now and in the future?
  1. Yes they refer to the same physical computer. But I don’t know why there two workers.
  2. The output of ip a is:
    [haom@RDLR0027 ~]$ ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host 
           valid_lft forever preferred_lft forever
    2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether ac:1f:6b:a1:25:56 brd ff:ff:ff:ff:ff:ff
        altname enp1s0f0
        inet 172.17.176.141/26 brd 172.17.176.191 scope global dynamic noprefixroute eno1
           valid_lft 41089sec preferred_lft 41089sec
        inet6 fe80::ae1f:6bff:fea1:2556/64 scope link noprefixroute 
           valid_lft forever preferred_lft forever
    3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
        link/ether ac:1f:6b:a1:25:57 brd ff:ff:ff:ff:ff:ff
        altname enp1s0f1
    4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
        link/ether 52:54:00:29:c5:b2 brd ff:ff:ff:ff:ff:ff
        inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
           valid_lft forever preferred_lft forever
    [haom@RDLR0027 ~]$ 
    
  3. yes. The server will retain the RDLR0027.ddns.med.umich.edu host name after eacdh reboot. any attempt from any remote computer will point to this computer noe and in the future.

Thanks for your response,

@haomingz It is possible that the server was originally set up using a localhost worker, but this arrangement is no longer compatible with your network configuration. Someone may have later run the cryosparcw connect command under the root account. Running cryosparcm or cryosparcw accounts as root should be avoided for important reasons (details).
In this case, you may want to (all commands as user cryosparc):

  1. ensure master and worker versions match:
    cat /app/apps/rhel8/cryosparc/cryosparc_master/version
    cat /app/apps/rhel8/cryosparc/cryosparc_worker/version
    
  2. determine the base port number of your CryoSPARC instance:
    grep CRYOSPARC_BASE_PORT /app/apps/rhel8/cryosparc/cryosparc_master/config.sh
    
  3. remove the current default scheduler lane (guide)
    cryosparcm cli "remove_scheduler_lane('default')"
    
  4. reconnect the worker, replacing 99999 with the base port determined earlier (guide)
    /app/apps/rhel8/cryosparc/cryosparc_worker/bin/cryosparcw connect --worker RDLR0027.ddns.med.umich.edu --master RDLR0027.ddns.med.umich.edu --ssdpath /data2/cryosparc_cache --port 99999
    

Does this help?

It worked! Thank you very much. I greatly appreciate your help.