Unix socket refused to connect

Hi All,

I apologize for posting a similar thread as some other folks, but unfortunately, none of the solutions for the similar posts worked on my machine.

Currently, when I run cryosparcm status, I see the following:

----------------------------------------------------------------------------
CryoSPARC System master node installed at
/home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_master2/cryosparc_master
Current cryoSPARC version: v4.3.1
----------------------------------------------------------------------------

CryoSPARC process status:

unix:///tmp/cryosparc-supervisor-4982856fca9b58f01d40087c1284c4ad.sock refused connection

----------------------------------------------------------------------------
*** CommandClient: (http://svlpcryosparc01.stjude.org:61102/api) URL Error [Errno 111] Connection refused
An error ocurred while checking license status
Could not get license verification status. Are all CryoSPARC processes RUNNING?

Running cryosparcm stop and cryosparcm restart does not work either. Whenever I try to stop, I get the following:

CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-4982856fca9b58f01d40087c1284c4ad.sock refused connection

I also tried moving /tmp/cryosparc-supervisor-4982856fca9b58f01d40087c1284c4ad.sock from /tmp to another location, and tried restarting, but that results in:

CryoSPARC is not already running.
If you would like to restart, use cryosparcm restart
Starting cryoSPARC System master process..
CryoSPARC is not already running.
configuring database
Warning: Could not get database status (attempt 1/3)
Warning: Could not get database status (attempt 2/3)
Warning: Could not get database status (attempt 3/3)

We recently had exceptionally high memory utilization on our CryoSPARC server, although I’m not sure if that is related.

Any and all help would be greatly appreciated!

Thank you!
Walid

Welcome to the forum @walidabualafia.
Just a word of caution: A
cryosparc-supervisor-*.sock file should neither be manually moved, deleted nor otherwise manipulated unless the termination of the related CryoSPARC processes has been confirmed with a suitable ps command.
Please can you provide additional information:

  1. Are additional CryoSPARC instances running on svlpcryosparc01?
  2. Did you try a complete CryoSPARC shutdown followed by a CryoSPARC startup?
  3. What steps/commands did you run when you ?

Does each CryoSPARC instance have its unique absolute path to cryosparc_master/?

If this supervisord process belongs to the CryoSPARC instance in question, it should be terminated according to a later step of the instructions.

What is the output of the command

ps -eo user,pid,ppid,start,command | grep '/kellogrp_sparc/software/hpc_cryosparc/'''

?

Please can you run these commands and post their outputs

whoami
grep -v LICENSE /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_master2/cryosparc_master/config.sh
date
df -h /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/
stat /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
kill 3096807
sleep 10
ps -eo user,pid,ppid,start,command | grep '/kellogrp_sparc/software/hpc_cryosparc/'
ps -eo pid,ppid,start,command | grep mongo
ls -l /tmp/
ls -l /tmp/cryosparc-supervisor-*.sock
cryosparcm start
date

Now I am really curious what the command

fuser /tmp/mongodb-61101.sock

would show.
As an aside, on CryoSPARC v < 4.4, you may want to include a space between the quotes on the line

export CRYOSPARC_MONGO_EXTRA_FLAGS=" "

(inside

/home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_master2/cryosparc_master/config.sh

). Otherwise, the variable will not have the desired effect of enabling database journaling.

Hi Wolfram,

Thank you for redacting the information up top after reading it.

fuser /tmp/mongodb-61101.sock does not return any output.

Similarly, fuser on any of the /tmp/mongodb-*.sock files does not return any output.

I just added the space in the quote, but no changes to the startup db error.

Thanks,
Walid

Running fuser on other Linux user’s files may require sudo to show associated processes.
What are the outputs of the commands (latter two commands only if the first command has no output)

fuser /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
# you may need to ask a sys admin to run the following commands
sudo fuser /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
sudo fuser /tmp/mongodb-61101.sock

?
[edited to insert missing command]

Hi @wtempel,

The first command yields no results. I am running it as the service account owning the instance.

I am the system admin for this node.

I tried the second command, but root is squashed on the fs. I could not stat the file in kellogrp_sparc’s home.

The third command also did not return any output.

Thank you!

I requested a new VM to move this service to. Do you think moving the service to a new VM will resolve this issue?

Also, is there documentation about moving instances from one VM to another? The same filesystem will be mounted on said new VM.

Thanks.

Also, would rebooting the current server help? Thank you!

One of the sudo commands included a typo; the correct commands are

sudo fuser /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
sudo fuser /tmp/mongodb-61101.sock

Because do not know the cause of DBPathInUse: Unable to lock the lock file:, I cannot predict whether rebooting the server or moving the CryoSPARC instance to another VM would resolve the issue.

Our guide applies to VMs to the extent that the VM implementation emulates a physical computer environment. Please can you provide more details on VMs are implemented in your case. Also, what is the output of the command

cryosparcm call head -n 1 /proc/1/sched
cryosparcm call uname -a

I added the fuser to the older commands earlier.

The VMs implemented in this case are running RHEL 8, 62GB memory, 12 vCPUs.

Regarding the commands:

[kellogrp_sparc@svlpcryosparc01: ~]$ cryosparcm call head -n 1 /proc/1/sched
systemd (1, #threads: 1)
[kellogrp_sparc@svlpcryosparc01: ~]$ cryosparcm call uname -a
Linux svlpcryosparc01.organization.org 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Thu Aug 31 10:29:22 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
  1. Would restoring from the latest back up help us get started?
  2. What about changing the base port?
  3. Would I be able to delete this thread after we’re done with the investigation?

Thanks again!

Not sure if this is expected behavior, but when I cryosparcm stop the service, I can see in /tmp that:

  • The supervisord*.sock file gets removed
  • The mongodb*.sock file is still there

Would removing the mongodb*.sock file help with this issue?

Thanks,
Walid

Please can you redact information inside your posts as necessary. If my responses include confidential information, please send me a private message with the specific snippets you would like me to redact.

This would be unexpected after a regular CryoSPARC shutdown, and suggests a disorderly termination of an old mongod process, either due to a SIGKILL (like using the
kill -9 option, strongly discouraged due to potential database corruption) signal or a power failure.
What are the outputs of the commands

date && uptime && last reboot
cryosparcm status | grep HOSTNAME

on the CryoSPARC master computer?
On the other hand

  • cryosparcm stop is not expected to clean up a satle mongodb*.sock file.
  • the mere presence of a stale mongodb*.sock did not prevent CryoSPARC startup in my (limited) testing.
    Before proceeding, you may want to understand the cause of
DBPathInUse: Unable to lock the lock file: /home/XXX/cryosparc_db2/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /home/XXX/cryosparc_db2 directory, terminating

A non-exhaustive list of candidate causes:

  • an non-mongod process has opened mongod.lock for some reason (inspection).
    sudo fuser /home/XXX/cryosparc_db2/mongod.lock
    sudo ss -anp | grep 61101 | sed "s/\s\+/ /g"
    
    may tell
  • if /home/XXX/cryosparc_db2/ is on a shared filesystem, a mongod or non-mongod process on another computer may have opened mongod.lock
  • storage for /home/XXX/cryosparc_db2/ may have been disrupted and may (temporarily) not be writeable
Thu May 30 12:46:38 CDT 2024
 12:46:38 up 39 days, 10:40,  3 users,  load average: 0.90, 0.84, 0.85

wtmp begins Wed May 29 09:32:17 2024
export CRYOSPARC_MASTER_HOSTNAME="svlpcryosparc01.organization.org"

The other fuser command did not yield any output. I ran them as the service account, as root is squashed on the filesystem.

I am going to try and set up a new instance of cryosparc, and restore it from the latest backup. I believe this will work. I am not sure what else I can do. Again, I am also not sure how to see who opened mongod.lock and inspected it.

Please let me know if you find any other pointers.

Thanks,
Walid