Cryosparc crashing - sock file

CryoSPARC requires a network connection (guide).

hi @wtempel thanks for your response!

  1. Both instances are running version 4.5.3
  2. (base) 918852355@ad.sfsu.edu@COSE-EGREENE-LX:~/.local/share/cryosparc/cryosparc_master/bin$ ./cryosparcm status
    ----------------------------------------------------------------------------
    CryoSPARC System master node installed at
    /home/918852355@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master
    Current cryoSPARC version: v4.5.3
    ----------------------------------------------------------------------------
    
    CryoSPARC process status:
    
    unix:///tmp/cryosparc-supervisor-2335236ec86699a1e337c668a9c3542b.sock refused connection
    
    ----------------------------------------------------------------------------
    An error ocurred while checking license status
    Could not get license verification status. Are all CryoSPARC processes RUNNING?
    
  3. The user-interface is unreachable when the socket refuses. The error appears in terminal when I go to check the status (cryosparcm status). When I rm the .sock file and restart cryosparc (successfully each time; ~6 times in total over the last two weeks) I see that my homogeneous refinement jobs had failed long before the restart.

Thanks again so much for your help!
Eric

The workstation is now connected to WiFi and I ran the commands. Below is the output I got.
Please advise. Thanks.

(base) [cryosparc_user@sn4622120602 ~]$ grep -e HOST -e PORT /home/cryosparc_user/software/cryosparc/cryosparc_master/config.sh
hostname -f
host sn4622120602
host $(hostname -f)
ls -l /tmp/cryosparc*sock
ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo
last reboot
sudo journalctl | grep i oom
tail -n 60 /home/cryosparc_user/software/cryosparc/cryosparc_master/run/supervisord.log
export CRYOSPARC_MASTER_HOSTNAME="sn4622120602"
export CRYOSPARC_BASE_PORT=39000
sn4622120602
Host sn4622120602 not found: 3(NXDOMAIN)
Host sn4622120602 not found: 3(NXDOMAIN)
srwx------. 1 cryosparc_user cryosparc_user 0 Jul  9 12:08 /tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock
   4648    3834 15:54:50 grep --color=auto -e cryosparc_ -e mongo
reboot   system boot  5.14.0-427.16.1. Fri Jul 19 15:42   still running
reboot   system boot  5.14.0-427.16.1. Thu Jul 18 14:41 - 14:30  (23:49)
reboot   system boot  5.14.0-427.16.1. Thu Jul 18 10:34 - 14:16  (03:41)
reboot   system boot  5.14.0-427.16.1. Wed Jul 17 14:29 - 16:55  (02:25)
reboot   system boot  5.14.0-427.16.1. Wed Jul 17 14:12 - 14:20  (00:07)
reboot   system boot  5.14.0-427.16.1. Mon Jul 15 08:47 - 16:42 (1+07:55)
reboot   system boot  5.14.0-427.16.1. Thu Jul 11 14:14 - 15:36  (01:21)
reboot   system boot  5.14.0-427.16.1. Thu Jul 11 12:23 - 13:26  (01:03)
reboot   system boot  5.14.0-427.16.1. Thu Jul 11 10:57 - 12:20  (01:23)
reboot   system boot  5.14.0-427.16.1. Wed Jul 10 15:19 - 16:35  (01:15)
reboot   system boot  5.14.0-427.16.1. Tue Jul  9 11:58 - 16:15  (04:17)
reboot   system boot  5.14.0-427.16.1. Tue Jul  9 11:40 - 11:49  (00:09)
reboot   system boot  5.14.0-427.16.1. Mon Jul  8 14:48 - 11:49  (21:01)
reboot   system boot  5.14.0-427.16.1. Thu May 23 17:50 - 18:06  (00:16)
reboot   system boot  5.14.0-427.16.1. Tue May 21 17:02 - 17:35 (2+00:33)
reboot   system boot  5.14.0-427.16.1. Mon May 20 11:06 - 15:04  (03:58)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:23 - 11:04 (2+20:41)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:14 - 14:16  (00:01)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:05 - 14:16  (00:10)
reboot   system boot  5.14.0-427.16.1. Fri May 17 14:01 - 14:04  (00:02)
reboot   system boot  5.14.0-427.16.1. Fri May 17 20:26 - 13:59  (-6:27)
reboot   system boot  5.14.0-362.8.1.e Fri May 17 12:05 - 13:24  (01:18)

wtmp begins Fri May 17 12:05:57 2024
grep: oom: No such file or directory
[sudo] password for cryosparc_user: 
cryosparc_user is not in the sudoers file.  This incident will be reported.
2024-05-22 13:06:25,814 INFO spawned: 'command_rtp' with pid 8442
2024-05-22 13:06:27,369 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:06:28,673 INFO spawned: 'app' with pid 8463
2024-05-22 13:06:30,117 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:06:30,221 INFO spawned: 'app_api' with pid 8481
2024-05-22 13:06:32,063 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:12,701 INFO waiting for app to stop
2024-05-22 13:10:12,701 INFO waiting for app_api to stop
2024-05-22 13:10:12,701 INFO waiting for command_core to stop
2024-05-22 13:10:12,701 INFO waiting for command_rtp to stop
2024-05-22 13:10:12,701 INFO waiting for command_vis to stop
2024-05-22 13:10:12,701 INFO waiting for database to stop
2024-05-22 13:10:12,708 WARN stopped: app (terminated by SIGTERM)
2024-05-22 13:10:12,708 WARN stopped: app_api (terminated by SIGTERM)
2024-05-22 13:10:13,275 INFO stopped: command_vis (exit status 0)
2024-05-22 13:10:13,310 INFO stopped: command_rtp (exit status 0)
2024-05-22 13:10:13,423 INFO stopped: database (exit status 0)
2024-05-22 13:10:14,910 INFO waiting for command_core to stop
2024-05-22 13:10:14,985 INFO stopped: command_core (exit status 0)
2024-05-22 13:10:15,916 INFO RPC interface 'supervisor' initialized
2024-05-22 13:10:15,916 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-05-22 13:10:15,917 INFO daemonizing the supervisord process
2024-05-22 13:10:15,918 INFO supervisord started with pid 9082
2024-05-22 13:10:19,984 INFO spawned: 'database' with pid 9189
2024-05-22 13:10:21,289 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:23,378 INFO spawned: 'command_core' with pid 9293
2024-05-22 13:10:28,940 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-05-22 13:10:29,359 INFO spawned: 'command_vis' with pid 9321
2024-05-22 13:10:30,957 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:31,060 INFO spawned: 'command_rtp' with pid 9361
2024-05-22 13:10:32,580 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:33,843 INFO spawned: 'app' with pid 9375
2024-05-22 13:10:35,279 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:35,383 INFO spawned: 'app_api' with pid 9392
2024-05-22 13:10:37,194 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-23 17:35:53,792 WARN received SIGTERM indicating exit request
2024-05-23 17:35:53,793 INFO waiting for app, app_api, command_core, command_rtp, command_vis, database to die
2024-05-23 17:35:53,793 WARN ignored SIGHUP indicating restart request (shutdown in progress)
2024-05-23 17:35:53,801 WARN exited: app (terminated by SIGTERM; not expected)
2024-05-23 17:35:53,802 WARN exited: app_api (terminated by SIGTERM; not expected)
2024-05-23 17:35:54,010 INFO exited: command_rtp (exit status 0; expected)
2024-05-23 17:35:54,025 INFO exited: command_core (exit status 0; expected)
2024-05-23 17:35:54,459 INFO exited: command_vis (exit status 0; expected)
2024-05-23 17:35:54,624 INFO stopped: database (exit status 0)
2024-07-09 12:08:37,338 INFO RPC interface 'supervisor' initialized
2024-07-09 12:08:37,339 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-09 12:08:37,340 INFO daemonizing the supervisord process
2024-07-09 12:08:37,340 INFO supervisord started with pid 4738
2024-07-09 12:08:41,480 INFO spawned: 'database' with pid 4845
2024-07-09 12:08:42,825 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:44,895 INFO spawned: 'command_core' with pid 4949
2024-07-09 12:08:50,593 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-07-09 12:08:51,006 INFO spawned: 'command_vis' with pid 4977
2024-07-09 12:08:52,811 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:52,918 INFO spawned: 'command_rtp' with pid 5018
2024-07-09 12:08:54,448 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:55,682 INFO spawned: 'app' with pid 5032
2024-07-09 12:08:57,187 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:57,293 INFO spawned: 'app_api' with pid 5049
2024-07-09 12:08:59,119 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)

@egreene Please can you nevertheless check

sudo journalctl | grep -i oom

When you encounter this error, please can you, before deleting or otherwise manipulating the sock file(s), collect and post the full file name of the *.sock file that refused the connection and output of the commands

ls -l /tmp/cryosparc-supervisor-*.sock
ps -eo user,pid,ppid,command | grep -e cryosparc_ -e mongo

Please can you post the outputs of these commands for some of the failed refinement jobs

cryosparcm cli "get_job('P99', 'J199', 'job_type', 'status', 'heartbeat_at', 'instance_information')"
cryosparcm eventlog P99 J199 | tail -n 10
cryosparcm joblog P99 J199 | tail -n10

where you replace P99, J199 with the project and job IDs of a few failed refinement jobs

@marygh

It seems that this file is older than the most recent reboots of the computer, suggesting “unclean” shutdowns of the computer. You may want to ensure that
CryoSPARC is stopped before any reboot of the computer.
If the computer reboots unexpectedly, you may need to remove

/tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock

manually after the computer reboot, but before restarting CryoSPARC.
Always confirm that no CryoSPARC processes are running before deleting the
cryosparc-supervisor-*.sock file (suggestions).
Does cryosparcm start work after that?

It is working! Thanks very much.

Hi there! I am having the same issue with the sock file. My computer has been crashing randomly, but cryosparc is left running. Here is what I’ve been doing and the errors I have received. cryosparc won’t let me shut it down after the computer reboots due to the refused connection.

cryosparcm start
Starting CryoSPARC System master process…
CryoSPARC is already running.
If you would like to restart, use cryosparcm restart
cryosparc@Hera:~$ cryosparcm status

CryoSPARC System master node installed at
/home/cryosparc/cryosparc/cryosparc_master
Current cryoSPARC version: v4.5.3

CryoSPARC process status:

unix:///tmp/cryosparc-supervisor-30cc4604421e57a31bbc937a97fa69b6.sock refused connection


An error ocurred while checking license status
Could not get license verification status. Are all CryoSPARC processes RUNNING?
cryosparc@Hera:~$ cryosparcm stop
CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-30cc4604421e57a31bbc937a97fa69b6.sock refused connection

Welcome to the forum @anelise .

You may want to ask your IT support to determine the cause of the random crashes.
If you encounter

when the computer has not crashed, you may want to look at topic XX.sock refused connection error.
For recovery from an incomplete or unclean CryoSPARC section, please refer to the CryoSPARC guide.