XX.sock refused connection error

Hi,

I was running cryosparc and it appears that it crshed as the web interface was not refreshing any more.

Typing the command

CryoSPARC process status:

gives the following message;

unix:///tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock refused connection

Trying to stop the program with cryosparcm stop produced the same message;

cryosparcm stop
CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock refused connection

Any idea how to get out of this problem and restart cryosparc ??

This is cryospac v 4.4.1 run in a single workstation (master and worker in the same machine)

Thanks

PS I apologize for posting about this in a separate thread as I have seen posts of similar occurrences of this error but so far no clear solution. What I would like to know if there is a simple way to restart cryosparc once that it gets stuck in this state.

Thanks

Perhaps not the best or cleaner solution but I just rebooted the machine and restarted cryosparc (cryosparcm start). So far it seem to be running OK.

Welcome to the forum @hsosa .

You may want to try this multistep shutdown procedure.

1 Like

It happened again. This time I was running a 3D refinement job overnight and when I came this morning the graphic user interface was like in waiting mode. These are the messages from the terminal:

$ cryosparcm status

CryoSPARC System master node installed at
/home/cryosparc_user/software/cryosparc/cryosparc_master
Current cryoSPARC version: v4.4.1

CryoSPARC process status:

unix:///tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock refused connection


An error occurred while checking license status
Could not get license verification status. Are all CryoSPARC processes RUNNING?


Doing the suggested orderly shutdown does not appear to solve the issue.
`
$ps -weo pid,ppid,start,cmd | grep -e cryosparc -e mongo | grep -v grep
  90003    3380   Jun 27 eog /home/cryosparc_user/Downloads/J48_averaged_power_spectra.png

$kill 90003

$ ps -weo pid,ppid,start,cmd | grep -e cryosparc -e mongo | grep -v grep
$
$ cryosparcm stop
CryoSPARC is running.
Stopping cryoSPARC 
unix:///tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock refused connection

I don’t understand what is the issue with the License status, probably is another issue as the program was running (license verified) up to this point. Also the job that caused the crash or the program to freeze was a helical refinement job not the download of the J48_averaged_power_spectra.png listed with the grep command.

As shown above cryosparcm stop is unable to stop the program from running. Killing the jobs listed with the grep command kills the jobs but cryosparcm status or stop just returns the sock refused connection error. So the program cannot be reset. Trying to open the user interface webpage ( http://localhost:39000) returns an Unable to Connect error.

After rebooting the system I was able to start cryosparc and get to the user interface where I found out that the refinement job ended in failure after 5 iterations. This was the last message recorded;

**** Kill signal sent by CryoSPARC (ID: ) ****

Job is unresponsive - no heartbeat received in 180 seconds.

What i would like to know is why the job failed and why after the failure crysosparc got stuck with the
unix:///tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock refused connection error and the only way to restart cryosparc was by rebooting the system

Thanks

It may not be necessary, or could even be harmful to other computer uses, to terminate all processes listed by the ps/grep combination. The ps/grep combination should merely narrow down the list of processes, but the need and consequence of termination should be evaluated separately for each individual process.

  90003    3380   Jun 27 eog /home/cryosparc_user/Downloads/J48_averaged_power_spectra.png

should not have been a target for termination in the context of a complete CryoSPARC shutdown.
The simultaneous presence of

and absence of any supervisord process suggests the abrupt termination of the supervisord process, for example by SIGKILL signal or a “cold” system restart.
What are the outputs of the command

uptime
free -h
last reboot
sudo journalctl | grep -i oom

?

I encountered the same problem when running Heterogeneous refinement in CryoSPARC v4.5.3. The heterogeneous refinement job succeeded after running for one day, but when I type ‘cryosparcm status’ and ‘cryosparm stop’, it displays ‘nix:///tmp/cryosparc-supervisor-fe9e080b183c8e3351061c0eabd13aa7.sock refused connection’. The only solution I’ve found is to kill related processes, remove the sock file, and restart CryoSPARC. This happens frequently, and I’d like to know how to resolve it permanently.
Here’s the output when I type the command:

root@spgpu:~# free -g && sync && echo 3 > /proc/sys/vm/drop_caches && echo "" && free -g
               total        used        free      shared  buff/cache   available
Mem:             503         261          33           0         208         238
Swap:             30           0          30
      
               total        used        free      shared  buff/cache   available
Mem:             503         259         243           0           0         241
Swap:             30           0          30
root@spgpu:~# journalctl | grep -i oom
Jul 23 00:13:39 spgpu systemd[3554]: vte-spawn-016f63ae-ba85-402d-9fb5-82673bc6ad58.scope: systemd-oomd killed 32 process(es) in this unit.
Jul 23 20:50:09 spgpu systemd[1]: Stopping Userspace Out-Of-Memory (OOM) Killer...
Jul 23 20:50:09 spgpu systemd[1]: systemd-oomd.service: Deactivated successfully.
Jul 23 20:50:09 spgpu systemd[1]: Stopped Userspace Out-Of-Memory (OOM) Killer.
Jul 23 20:50:09 spgpu systemd[1]: systemd-oomd.service: Consumed 16min 50.629s CPU time.
Jul 23 20:51:58 spgpu systemd[1]: Starting Userspace Out-Of-Memory (OOM) Killer...
Jul 23 20:51:59 spgpu systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer.
Jul 24 19:35:23 spgpu systemd-oomd[2491]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-caffa588-d5e5-4115-a24b-72fd61969e19.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 61.17% > 50.00% for > 20s with reclaim activity
Jul 24 19:35:23 spgpu systemd[3765]: vte-spawn-caffa588-d5e5-4115-a24b-72fd61969e19.scope: systemd-oomd killed 118 process(es) in this unit.
Jul 25 13:49:24 spgpu systemd-oomd[2491]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-798bf1f7-4567-4d41-b495-70fe6eb57d1e.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 63.95% > 50.00% for > 20s with reclaim activity
Jul 25 13:49:24 spgpu systemd[3765]: vte-spawn-798bf1f7-4567-4d41-b495-70fe6eb57d1e.scope: systemd-oomd killed 4 process(es) in this unit.
Jul 25 13:49:39 spgpu systemd-oomd[2491]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-ae165842-d0d9-4ef3-a09b-5a5456a4769e.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 82.53% > 50.00% for > 20s with reclaim activity
Jul 25 13:49:39 spgpu systemd[3765]: vte-spawn-ae165842-d0d9-4ef3-a09b-5a5456a4769e.scope: systemd-oomd killed 74 process(es) in this unit.
Jul 26 14:49:52 spgpu systemd-oomd[2491]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-84deb7cd-2ef5-4f6d-91b8-0f3d48710333.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 53.13% > 50.00% for > 20s with reclaim activity
Jul 26 14:49:52 spgpu systemd[3765]: vte-spawn-84deb7cd-2ef5-4f6d-91b8-0f3d48710333.scope: systemd-oomd killed 177 process(es) in this unit.

@tosyl If your CryoSPARC instance runs under the Linux user id 1000 and the supervisord process is among the processes killed, you might end up with a stale

/tmp/cryosparc-supervisor-fe9e080b183c8e3351061c0eabd13aa7.sock

file.
I am not sure whether your system logs the process IDs of killed processes. If it does, you may compare them to process IDs stored in the supervisord log:

cryosparcm log supervisord | grep "supervisord started with pid"

to confirm whether supervisord processes are in fact being target by OOM management. If you find this to be the case, you may want to consider how and whether you want systemd-oomd to manage RAM on your computer (google).

Many thanks for your reply! I will try with systemd-oomd turned off. By the way, I’ve noticed that CryoSPARC jobs don’t release used RAM after completion. Is there a solution for this?

I suspect this this question pertains to how the OS manages RAM usage. This management should happen automatically. It is possible that certain OS settings can be changed to improve performance, but changes may or may not be indicated in your specific case.
If, on the other hand, you find that, after job completion,

  • cryosparc_worker processes continue running
  • cryosparc_master processes use more RAM than expected

you may have encountered a bug. If you encountered or suspect a bug in CryoSPARC, please post details here in the forum.

Thank you for your suggestions! I’ll try the steps you mentioned and provide an update.