Cryosparc crashing - sock file

@wtempel sure:

bell@ub22-04:~$ uptime -s
2024-04-09 09:29:54

@carterwheat Unfortunately, I was not able to confirm the (only) hypothesis I had based on your problem description

and the commands’ outputs that you so patiently provided.
The hypothesis went like this:

  1. CryoSPARC was started as normal.
  2. CryoSPARC processes were abruptly killed due to some event (RAM or other system load?. A mere TERM signal would have allowed for the cleanup of the file)

The kernel “OOM killer” seemed to me a good candidate for part 2., but there appear to be no supporting log records. Please let us know if you have any additional information that would point to an alternative cause, such as if the CryoSPARC processes are running inside a container or are subject to some cluster workload manager.

@wtempel Thanks for all of your help. I will keep you updated if anything else comes up that may point us in the right direction.

-Carter

Hi all,
I’m facing the same problem. I tried some of these suggestions, but none worked. Attached is the error I’m getting.

I’m running CryoSPARC on a local EXXACT workstation on Linux Rocky 9, if it helps. Any suggestions?

Welcome to the forum @marygh. Please can you post additional information:

  1. Please describe what you have tried, and the respective outcomes.
  2. Please post the outputs of these commands:
    grep -e HOST -e PORT /home/cryosparc_user/software/cryosparc/cryosparc_master/config.sh
    hostname -f
    host sn4622120602
    host $(hostname -f)
    ls -l /tmp/cryosparc*sock
    ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo
    last reboot
    sudo journalctl | grep -i oom
    tail -n 60 /home/cryosparc_user/software/cryosparc/cryosparc_master/run/supervisord.log
    

This is the output for the commands:

@marygh Please can you work with your lab IT support to

  • register your network adapter for a stable DHCP reservation
  • create a DNS record for the computer’s permanent hostname
  • configure the computer’s hostname to be consistent with the DNS entry from the previous step

After these steps, you may want to

  • define CRYOSPARC_MASTER_HOSTNAME (inside cryosparc_master/config.sh) with the newly assigned, permanent full hostname
  • perform a complete shutdown and restart of CryoSPARC

Does CryoSPARC startup properly after these steps? If it does not, please post the outputs of these commands as text (instead of a screenshot):

grep -e HOST -e PORT /home/cryosparc_user/software/cryosparc/cryosparc_master/config.sh
hostname -f
host $(hostname -f)
cat /etc/hosts
ls -l /tmp/cryosparc*sock
ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo

If CryoSPARC starts normally, you may have to reconfigure the worker component with the command
cryosparcw connect and appropriate parameters. To help us suggest appropriate parameters, please post the outputs of the following commands

cryosparcm cli "get_scheduler_targets()"
hostname -f
cat /etc/hosts

Thank you! I just need to mention that this workstation is not connected to the Wi-Fi.

Hi all,
I am observing the same thing as mentioned previously by other users. Briefly, I have a new Dell workstation (4x NVIDIA A4500, 48 core (intel xeon w7), 256 Gb DDR5 Ram, 5T SSD, 118T HDD). I have two cryosparc users currently and I have opted for the ‘single workstation’ installation for each (which had previously worked well in my postdoc lab). However, I am routinely getting the ‘socket refused connection’ error when both users are running jobs. User 1 port = 39000 and user 2 port = 39020. We have intentionally only pushed the system to 50% capacity (GPUs are under 10Gb typically; RAM usually has >100 Gb free at any given moment). User 1 has sudo access and User 2 does not.

I ran this command from the comment thread and got this result:

grep -e HOST -e PORT /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/config.sh

hostname -f

host COSE-EGREENE-LX.clients.ad.sfsu.edu

host $(hostname -f)

ls -l /tmp/cryosparc*sock

ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo

last reboot

tail -n 60 /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/run/supervisord.log

export CRYOSPARC_MASTER_HOSTNAME="COSE-EGREENE-LX.clients.ad.sfsu.edu"

export CRYOSPARC_BASE_PORT=39020

COSE-EGREENE-LX.clients.ad.sfsu.edu

COSE-EGREENE-LX.clients.ad.sfsu.edu has address 130.212.214.209

COSE-EGREENE-LX.clients.ad.sfsu.edu has IPv6 address fe80::6c41:9d2b:e7a9:d5d6

COSE-EGREENE-LX.clients.ad.sfsu.edu has address 130.212.214.209

COSE-EGREENE-LX.clients.ad.sfsu.edu has IPv6 address fe80::6c41:9d2b:e7a9:d5d6

srwx------ 1 921270295@ad.sfsu.edu domain users@ad.sfsu.edu 0 Jul 15 11:42 /tmp/cryosparc-supervisor-714ae7c340d4df77be474f8627fd6c9c.sock

1120743 85729 11:42:12 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/supervisord.conf

1120879 1120743 11:42:16 mongod --auth --dbpath /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_database --port 39021 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all

1120994 1120743 11:42:20 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39022 cryosparc_command.command_core:start() -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py

1120995 1120994 11:42:20 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39022 cryosparc_command.command_core:start() -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py

1121024 1120743 11:42:26 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39023 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py

1121039 1121024 11:42:26 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39023 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py

1121048 1120743 11:42:27 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39025 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py

1121060 1121048 11:42:27 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39025 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py

1121094 1120743 11:42:31 /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js

1122908 1120995 11:51:01 bash /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_worker/bin/cryosparcw run --project P2 --job J29 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022

1122918 1122908 11:51:01 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J29 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022

1122921 1122918 11:51:01 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J29 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022

1122945 1120995 11:51:04 bash /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_worker/bin/cryosparcw run --project P2 --job J30 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022

1122955 1122945 11:51:04 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J30 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022

1122957 1122955 11:51:04 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J30 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022

1125517 1124882 12:08:52 grep --color=auto -e cryosparc_ -e mongo

reboot system boot 6.5.0-41-generic Tue Jul 2 12:04 still running

reboot system boot 6.5.0-41-generic Mon Jul 1 13:38 still running

reboot system boot 6.5.0-41-generic Thu Jun 27 11:18 still running

reboot system boot 6.5.0-35-generic Mon Jun 10 10:32 still running

reboot system boot 6.5.0-35-generic Wed Jun 5 14:51 still running

reboot system boot 6.5.0-35-generic Wed Jun 5 14:22 - 14:31 (00:08)

reboot system boot 6.5.0-35-generic Wed Jun 5 09:52 - 14:31 (04:38)

reboot system boot 6.5.0-35-generic Mon Jun 3 18:00 - 14:31 (1+20:30)

reboot system boot 6.5.0-35-generic Mon Jun 3 14:06 - 17:54 (03:47)

reboot system boot 6.5.0-28-generic Wed May 1 13:29 - 17:54 (33+04:24)

reboot system boot 6.5.0-28-generic Thu Apr 25 08:53 - 13:27 (6+04:33)

reboot system boot 6.5.0-28-generic Wed Apr 24 09:22 - 08:49 (23:27)

reboot system boot 6.5.0-27-generic Tue Apr 16 11:27 - 09:09 (7+21:42)

reboot system boot 6.5.0-26-generic Wed Apr 3 12:53 - 08:47 (7+19:54)

reboot system boot 6.5.0-26-generic Wed Mar 27 17:40 - 12:40 (6+18:59)

reboot system boot 6.5.0-26-generic Thu Mar 21 10:21 - 14:46 (6+04:24)

reboot system boot 6.5.0-26-generic Wed Mar 20 14:58 - 15:05 (00:06)

reboot system boot 6.5.0-26-generic Wed Mar 20 13:52 - 13:54 (00:01)

reboot system boot 6.5.0-26-generic Wed Mar 20 13:05 - 13:50 (00:45)

reboot system boot 6.5.0-26-generic Wed Mar 20 12:59 - 13:04 (00:04)

reboot system boot 6.5.0-26-generic Wed Mar 20 12:36 - 12:58 (00:22)

reboot system boot 6.5.0-26-generic Tue Mar 19 15:19 - 12:35 (21:15)

wtmp begins Tue Mar 19 15:19:43 2024

2024-07-10 17:30:36,938 CRIT Server 'unix_http_server' running without any HTTP authentication checking

2024-07-10 17:30:36,941 INFO daemonizing the supervisord process

2024-07-10 17:30:36,942 INFO supervisord started with pid 710428

2024-07-10 17:30:41,479 INFO spawned: 'database' with pid 710536

2024-07-10 17:30:42,674 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-10 17:30:44,904 INFO spawned: 'command_core' with pid 710644

2024-07-10 17:30:50,557 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

2024-07-10 17:30:51,078 INFO spawned: 'command_vis' with pid 710677

2024-07-10 17:30:52,799 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-10 17:30:52,952 INFO spawned: 'command_rtp' with pid 710706

2024-07-10 17:30:54,609 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-10 17:30:55,969 INFO spawned: 'app' with pid 710720

2024-07-10 17:30:57,436 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-10 17:30:57,553 INFO spawned: 'app_api' with pid 710740

2024-07-10 17:30:59,356 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-12 13:56:34,780 INFO waiting for app to stop

2024-07-12 13:56:34,780 INFO waiting for app_api to stop

2024-07-12 13:56:34,780 INFO waiting for command_core to stop

2024-07-12 13:56:34,780 INFO waiting for command_rtp to stop

2024-07-12 13:56:34,780 INFO waiting for command_vis to stop

2024-07-12 13:56:34,780 INFO waiting for database to stop

2024-07-12 13:56:34,799 WARN stopped: app (terminated by SIGTERM)

2024-07-12 13:56:34,802 WARN stopped: app_api (terminated by SIGTERM)

2024-07-12 13:56:35,132 INFO stopped: database (exit status 0)

2024-07-12 13:56:35,486 INFO stopped: command_vis (exit status 0)

2024-07-12 13:56:35,487 INFO stopped: command_rtp (exit status 0)

2024-07-12 13:56:37,086 INFO waiting for command_core to stop

2024-07-12 13:56:37,186 INFO stopped: command_core (exit status 0)

2024-07-12 14:29:56,282 INFO RPC interface 'supervisor' initialized

2024-07-12 14:29:56,282 CRIT Server 'unix_http_server' running without any HTTP authentication checking

2024-07-12 14:29:56,284 INFO daemonizing the supervisord process

2024-07-12 14:29:56,285 INFO supervisord started with pid 875815

2024-07-12 14:30:00,554 INFO spawned: 'database' with pid 875922

2024-07-12 14:30:02,090 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-12 14:30:03,728 INFO spawned: 'command_core' with pid 876030

2024-07-12 14:30:09,326 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

2024-07-12 14:30:09,784 INFO spawned: 'command_vis' with pid 876100

2024-07-12 14:30:11,531 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-12 14:30:11,653 INFO spawned: 'command_rtp' with pid 876131

2024-07-12 14:30:13,274 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-12 14:30:14,728 INFO spawned: 'app' with pid 876183

2024-07-12 14:30:16,233 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-12 14:30:16,365 INFO spawned: 'app_api' with pid 876204

2024-07-12 14:30:17,367 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-15 11:42:12,511 INFO RPC interface 'supervisor' initialized

2024-07-15 11:42:12,511 CRIT Server 'unix_http_server' running without any HTTP authentication checking

2024-07-15 11:42:12,513 INFO daemonizing the supervisord process

2024-07-15 11:42:12,514 INFO supervisord started with pid 1120743

2024-07-15 11:42:16,712 INFO spawned: 'database' with pid 1120879

2024-07-15 11:42:18,319 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-15 11:42:20,410 INFO spawned: 'command_core' with pid 1120994

2024-07-15 11:42:26,033 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)

2024-07-15 11:42:26,446 INFO spawned: 'command_vis' with pid 1121024

2024-07-15 11:42:27,446 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-15 11:42:27,571 INFO spawned: 'command_rtp' with pid 1121048

2024-07-15 11:42:29,116 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-15 11:42:30,185 INFO spawned: 'app' with pid 1121073

2024-07-15 11:42:31,660 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

2024-07-15 11:42:31,767 INFO spawned: 'app_api' with pid 1121094

2024-07-15 11:42:33,727 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

We get the socket refusal after running homogeneous refinement jobs which fail at different times. We are now trying to just get one dataset analyzed at a time but, of course, would like to increase throughput. Any advice is greatly appreciated! Thank you!!

Is it connected to a wired network?

@egreene May I ask

  1. What are the CryoSPARC versions of the two instances?
  2. What is the exact text of the socket refusal message(s)?
  3. Where (which log file or location in the UI) do observe the message(s)?

No, it doesn’t have a network connection

CryoSPARC requires a network connection (guide).

hi @wtempel thanks for your response!

  1. Both instances are running version 4.5.3
  2. (base) 918852355@ad.sfsu.edu@COSE-EGREENE-LX:~/.local/share/cryosparc/cryosparc_master/bin$ ./cryosparcm status
    ----------------------------------------------------------------------------
    CryoSPARC System master node installed at
    /home/918852355@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master
    Current cryoSPARC version: v4.5.3
    ----------------------------------------------------------------------------
    
    CryoSPARC process status:
    
    unix:///tmp/cryosparc-supervisor-2335236ec86699a1e337c668a9c3542b.sock refused connection
    
    ----------------------------------------------------------------------------
    An error ocurred while checking license status
    Could not get license verification status. Are all CryoSPARC processes RUNNING?
    
  3. The user-interface is unreachable when the socket refuses. The error appears in terminal when I go to check the status (cryosparcm status). When I rm the .sock file and restart cryosparc (successfully each time; ~6 times in total over the last two weeks) I see that my homogeneous refinement jobs had failed long before the restart.

Thanks again so much for your help!
Eric

The workstation is now connected to WiFi and I ran the commands. Below is the output I got.
Please advise. Thanks.

(base) [cryosparc_user@sn4622120602 ~]$ grep -e HOST -e PORT /home/cryosparc_user/software/cryosparc/cryosparc_master/config.sh
hostname -f
host sn4622120602
host $(hostname -f)
ls -l /tmp/cryosparc*sock
ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo
last reboot
sudo journalctl | grep i oom
tail -n 60 /home/cryosparc_user/software/cryosparc/cryosparc_master/run/supervisord.log
export CRYOSPARC_MASTER_HOSTNAME="sn4622120602"
export CRYOSPARC_BASE_PORT=39000
sn4622120602
Host sn4622120602 not found: 3(NXDOMAIN)
Host sn4622120602 not found: 3(NXDOMAIN)
srwx------. 1 cryosparc_user cryosparc_user 0 Jul  9 12:08 /tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock
   4648    3834 15:54:50 grep --color=auto -e cryosparc_ -e mongo
reboot   system boot  5.14.0-427.16.1. Fri Jul 19 15:42   still running
reboot   system boot  5.14.0-427.16.1. Thu Jul 18 14:41 - 14:30  (23:49)
reboot   system boot  5.14.0-427.16.1. Thu Jul 18 10:34 - 14:16  (03:41)
reboot   system boot  5.14.0-427.16.1. Wed Jul 17 14:29 - 16:55  (02:25)
reboot   system boot  5.14.0-427.16.1. Wed Jul 17 14:12 - 14:20  (00:07)
reboot   system boot  5.14.0-427.16.1. Mon Jul 15 08:47 - 16:42 (1+07:55)
reboot   system boot  5.14.0-427.16.1. Thu Jul 11 14:14 - 15:36  (01:21)
reboot   system boot  5.14.0-427.16.1. Thu Jul 11 12:23 - 13:26  (01:03)
reboot   system boot  5.14.0-427.16.1. Thu Jul 11 10:57 - 12:20  (01:23)
reboot   system boot  5.14.0-427.16.1. Wed Jul 10 15:19 - 16:35  (01:15)
reboot   system boot  5.14.0-427.16.1. Tue Jul  9 11:58 - 16:15  (04:17)
reboot   system boot  5.14.0-427.16.1. Tue Jul  9 11:40 - 11:49  (00:09)
reboot   system boot  5.14.0-427.16.1. Mon Jul  8 14:48 - 11:49  (21:01)
reboot   system boot  5.14.0-427.16.1. Thu May 23 17:50 - 18:06  (00:16)
reboot   system boot  5.14.0-427.16.1. Tue May 21 17:02 - 17:35 (2+00:33)
reboot   system boot  5.14.0-427.16.1. Mon May 20 11:06 - 15:04  (03:58)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:23 - 11:04 (2+20:41)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:14 - 14:16  (00:01)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:05 - 14:16  (00:10)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:01 - 14:04  (00:02)
reboot   system boot  5.14.0-427.16.1. Fri May 17 20:26 - 13:59  (-6:27)
reboot   system boot  5.14.0-362.8.1.e Fri May 17 12:05 - 13:24  (01:18)

wtmp begins Fri May 17 12:05:57 2024
grep: oom: No such file or directory
[sudo] password for cryosparc_user: 
cryosparc_user is not in the sudoers file.  This incident will be reported.
2024-05-22 13:06:25,814 INFO spawned: 'command_rtp' with pid 8442
2024-05-22 13:06:27,369 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:06:28,673 INFO spawned: 'app' with pid 8463
2024-05-22 13:06:30,117 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:06:30,221 INFO spawned: 'app_api' with pid 8481
2024-05-22 13:06:32,063 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:12,701 INFO waiting for app to stop
2024-05-22 13:10:12,701 INFO waiting for app_api to stop
2024-05-22 13:10:12,701 INFO waiting for command_core to stop
2024-05-22 13:10:12,701 INFO waiting for command_rtp to stop
2024-05-22 13:10:12,701 INFO waiting for command_vis to stop
2024-05-22 13:10:12,701 INFO waiting for database to stop
2024-05-22 13:10:12,708 WARN stopped: app (terminated by SIGTERM)
2024-05-22 13:10:12,708 WARN stopped: app_api (terminated by SIGTERM)
2024-05-22 13:10:13,275 INFO stopped: command_vis (exit status 0)
2024-05-22 13:10:13,310 INFO stopped: command_rtp (exit status 0)
2024-05-22 13:10:13,423 INFO stopped: database (exit status 0)
2024-05-22 13:10:14,910 INFO waiting for command_core to stop
2024-05-22 13:10:14,985 INFO stopped: command_core (exit status 0)
2024-05-22 13:10:15,916 INFO RPC interface 'supervisor' initialized
2024-05-22 13:10:15,916 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-05-22 13:10:15,917 INFO daemonizing the supervisord process
2024-05-22 13:10:15,918 INFO supervisord started with pid 9082
2024-05-22 13:10:19,984 INFO spawned: 'database' with pid 9189
2024-05-22 13:10:21,289 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:23,378 INFO spawned: 'command_core' with pid 9293
2024-05-22 13:10:28,940 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-05-22 13:10:29,359 INFO spawned: 'command_vis' with pid 9321
2024-05-22 13:10:30,957 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:31,060 INFO spawned: 'command_rtp' with pid 9361
2024-05-22 13:10:32,580 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:33,843 INFO spawned: 'app' with pid 9375
2024-05-22 13:10:35,279 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:35,383 INFO spawned: 'app_api' with pid 9392
2024-05-22 13:10:37,194 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-23 17:35:53,792 WARN received SIGTERM indicating exit request
2024-05-23 17:35:53,793 INFO waiting for app, app_api, command_core, command_rtp, command_vis, database to die
2024-05-23 17:35:53,793 WARN ignored SIGHUP indicating restart request (shutdown in progress)
2024-05-23 17:35:53,801 WARN exited: app (terminated by SIGTERM; not expected)
2024-05-23 17:35:53,802 WARN exited: app_api (terminated by SIGTERM; not expected)
2024-05-23 17:35:54,010 INFO exited: command_rtp (exit status 0; expected)
2024-05-23 17:35:54,025 INFO exited: command_core (exit status 0; expected)
2024-05-23 17:35:54,459 INFO exited: command_vis (exit status 0; expected)
2024-05-23 17:35:54,624 INFO stopped: database (exit status 0)
2024-07-09 12:08:37,338 INFO RPC interface 'supervisor' initialized
2024-07-09 12:08:37,339 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-09 12:08:37,340 INFO daemonizing the supervisord process
2024-07-09 12:08:37,340 INFO supervisord started with pid 4738
2024-07-09 12:08:41,480 INFO spawned: 'database' with pid 4845
2024-07-09 12:08:42,825 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:44,895 INFO spawned: 'command_core' with pid 4949
2024-07-09 12:08:50,593 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-07-09 12:08:51,006 INFO spawned: 'command_vis' with pid 4977
2024-07-09 12:08:52,811 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:52,918 INFO spawned: 'command_rtp' with pid 5018
2024-07-09 12:08:54,448 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:55,682 INFO spawned: 'app' with pid 5032
2024-07-09 12:08:57,187 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:57,293 INFO spawned: 'app_api' with pid 5049
2024-07-09 12:08:59,119 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

@egreene Please can you nevertheless check

sudo journalctl | grep -i oom

When you encounter this error, please can you, before deleting or otherwise manipulating the sock file(s), collect and post the full file name of the *.sock file that refused the connection and output of the commands

ls -l /tmp/cryosparc-supervisor-*.sock
ps -eo user,pid,ppid,command | grep -e cryosparc_ -e mongo

Please can you post the outputs of these commands for some of the failed refinement jobs

cryosparcm cli "get_job('P99', 'J199', 'job_type', 'status', 'heartbeat_at', 'instance_information')"
cryosparcm eventlog P99 J199 | tail -n 10
cryosparcm joblog P99 J199 | tail -n10

where you replace P99, J199 with the project and job IDs of a few failed refinement jobs

@marygh

It seems that this file is older than the most recent reboots of the computer, suggesting “unclean” shutdowns of the computer. You may want to ensure that
CryoSPARC is stopped before any reboot of the computer.
If the computer reboots unexpectedly, you may need to remove

/tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock

manually after the computer reboot, but before restarting CryoSPARC.
Always confirm that no CryoSPARC processes are running before deleting the
cryosparc-supervisor-*.sock file (suggestions).
Does cryosparcm start work after that?

It is working! Thanks very much.

Hi there! I am having the same issue with the sock file. My computer has been crashing randomly, but cryosparc is left running. Here is what I’ve been doing and the errors I have received. cryosparc won’t let me shut it down after the computer reboots due to the refused connection.

cryosparcm start
Starting CryoSPARC System master process…
CryoSPARC is already running.
If you would like to restart, use cryosparcm restart
cryosparc@Hera:~$ cryosparcm status

CryoSPARC System master node installed at
/home/cryosparc/cryosparc/cryosparc_master
Current cryoSPARC version: v4.5.3

CryoSPARC process status:

unix:///tmp/cryosparc-supervisor-30cc4604421e57a31bbc937a97fa69b6.sock refused connection


An error ocurred while checking license status
Could not get license verification status. Are all CryoSPARC processes RUNNING?
cryosparc@Hera:~$ cryosparcm stop
CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-30cc4604421e57a31bbc937a97fa69b6.sock refused connection

Welcome to the forum @anelise .

You may want to ask your IT support to determine the cause of the random crashes.
If you encounter

when the computer has not crashed, you may want to look at topic XX.sock refused connection error.
For recovery from an incomplete or unclean CryoSPARC section, please refer to the CryoSPARC guide.