Hi All,
I apologize for posting a similar thread as some other folks, but unfortunately, none of the solutions for the similar posts worked on my machine.
Currently, when I run cryosparcm status
, I see the following:
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_master2/cryosparc_master
Current cryoSPARC version: v4.3.1
----------------------------------------------------------------------------
CryoSPARC process status:
unix:///tmp/cryosparc-supervisor-4982856fca9b58f01d40087c1284c4ad.sock refused connection
----------------------------------------------------------------------------
*** CommandClient: (http://svlpcryosparc01.stjude.org:61102/api) URL Error [Errno 111] Connection refused
An error ocurred while checking license status
Could not get license verification status. Are all CryoSPARC processes RUNNING?
Running cryosparcm stop
and cryosparcm restart
does not work either. Whenever I try to stop, I get the following:
CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-4982856fca9b58f01d40087c1284c4ad.sock refused connection
I also tried moving /tmp/cryosparc-supervisor-4982856fca9b58f01d40087c1284c4ad.sock
from /tmp to another location, and tried restarting, but that results in:
CryoSPARC is not already running.
If you would like to restart, use cryosparcm restart
Starting cryoSPARC System master process..
CryoSPARC is not already running.
configuring database
Warning: Could not get database status (attempt 1/3)
Warning: Could not get database status (attempt 2/3)
Warning: Could not get database status (attempt 3/3)
We recently had exceptionally high memory utilization on our CryoSPARC server, although I’m not sure if that is related.
Any and all help would be greatly appreciated!
Thank you!
Walid
Welcome to the forum @walidabualafia.
Just a word of caution: A
cryosparc-supervisor-*.sock
file should neither be manually moved, deleted nor otherwise manipulated unless the termination of the related CryoSPARC processes has been confirmed with a suitable ps
command.
Please can you provide additional information:
- Are additional CryoSPARC instances running on svlpcryosparc01?
- Did you try a complete CryoSPARC shutdown followed by a CryoSPARC startup?
- What steps/commands did you run when you
?
Does each CryoSPARC instance have its unique absolute path to cryosparc_master/
?
If this supervisord
process belongs to the CryoSPARC instance in question, it should be terminated according to a later step of the instructions.
What is the output of the command
ps -eo user,pid,ppid,start,command | grep '/kellogrp_sparc/software/hpc_cryosparc/'''
?
Please can you run these commands and post their outputs
whoami
grep -v LICENSE /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_master2/cryosparc_master/config.sh
date
df -h /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/
stat /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
kill 3096807
sleep 10
ps -eo user,pid,ppid,start,command | grep '/kellogrp_sparc/software/hpc_cryosparc/'
ps -eo pid,ppid,start,command | grep mongo
ls -l /tmp/
ls -l /tmp/cryosparc-supervisor-*.sock
cryosparcm start
date
Now I am really curious what the command
fuser /tmp/mongodb-61101.sock
would show.
As an aside, on CryoSPARC v < 4.4, you may want to include a space between the quotes on the line
export CRYOSPARC_MONGO_EXTRA_FLAGS=" "
(inside
/home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_master2/cryosparc_master/config.sh
). Otherwise, the variable will not have the desired effect of enabling database journaling.
Hi Wolfram,
Thank you for redacting the information up top after reading it.
fuser /tmp/mongodb-61101.sock
does not return any output.
Similarly, fuser
on any of the /tmp/mongodb-*.sock
files does not return any output.
I just added the space in the quote, but no changes to the startup db error.
Thanks,
Walid
Running fuser
on other Linux user’s files may require sudo
to show associated processes.
What are the outputs of the commands (latter two commands only if the first command has no output)
fuser /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
# you may need to ask a sys admin to run the following commands
sudo fuser /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
sudo fuser /tmp/mongodb-61101.sock
?
[edited to insert missing command]
Hi @wtempel,
The first command yields no results. I am running it as the service account owning the instance.
I am the system admin for this node.
I tried the second command, but root is squashed on the fs. I could not stat the file in kellogrp_sparc’s home.
The third command also did not return any output.
Thank you!
I requested a new VM to move this service to. Do you think moving the service to a new VM will resolve this issue?
Also, is there documentation about moving instances from one VM to another? The same filesystem will be mounted on said new VM.
Thanks.
Also, would rebooting the current server help? Thank you!
One of the sudo
commands included a typo; the correct commands are
sudo fuser /home/kellogrp_sparc/software/hpc_cryosparc/cryosparc_db2/mongod.lock
sudo fuser /tmp/mongodb-61101.sock
Because do not know the cause of DBPathInUse: Unable to lock the lock file:
, I cannot predict whether rebooting the server or moving the CryoSPARC instance to another VM would resolve the issue.
Our guide applies to VMs to the extent that the VM implementation emulates a physical computer environment. Please can you provide more details on VMs are implemented in your case. Also, what is the output of the command
cryosparcm call head -n 1 /proc/1/sched
cryosparcm call uname -a
I added the fuser to the older commands earlier.
The VMs implemented in this case are running RHEL 8, 62GB memory, 12 vCPUs.
Regarding the commands:
[kellogrp_sparc@svlpcryosparc01: ~]$ cryosparcm call head -n 1 /proc/1/sched
systemd (1, #threads: 1)
[kellogrp_sparc@svlpcryosparc01: ~]$ cryosparcm call uname -a
Linux svlpcryosparc01.organization.org 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Thu Aug 31 10:29:22 EDT 2023 x86_64 x86_64 x86_64 GNU/Linux
- Would restoring from the latest back up help us get started?
- What about changing the base port?
- Would I be able to delete this thread after we’re done with the investigation?
Thanks again!
Not sure if this is expected behavior, but when I cryosparcm stop
the service, I can see in /tmp
that:
- The supervisord*.sock file gets removed
- The mongodb*.sock file is still there
Would removing the mongodb*.sock file help with this issue?
Thanks,
Walid
Please can you redact information inside your posts as necessary. If my responses include confidential information, please send me a private message with the specific snippets you would like me to redact.
This would be unexpected after a regular CryoSPARC shutdown, and suggests a disorderly termination of an old mongod
process, either due to a SIGKILL
(like using the
kill -9
option, strongly discouraged due to potential database corruption) signal or a power failure.
What are the outputs of the commands
date && uptime && last reboot
cryosparcm status | grep HOSTNAME
on the CryoSPARC master computer?
On the other hand
cryosparcm stop
is not expected to clean up a satle mongodb*.sock
file.
- the mere presence of a stale
mongodb*.sock
did not prevent CryoSPARC startup in my (limited) testing.
Before proceeding, you may want to understand the cause of
DBPathInUse: Unable to lock the lock file: /home/XXX/cryosparc_db2/mongod.lock (Resource temporarily unavailable). Another mongod instance is already running on the /home/XXX/cryosparc_db2 directory, terminating
A non-exhaustive list of candidate causes:
Thu May 30 12:46:38 CDT 2024
12:46:38 up 39 days, 10:40, 3 users, load average: 0.90, 0.84, 0.85
wtmp begins Wed May 29 09:32:17 2024
export CRYOSPARC_MASTER_HOSTNAME="svlpcryosparc01.organization.org"
The other fuser command did not yield any output. I ran them as the service account, as root is squashed on the filesystem.
I am going to try and set up a new instance of cryosparc, and restore it from the latest backup. I believe this will work. I am not sure what else I can do. Again, I am also not sure how to see who opened mongod.lock
and inspected it.
Please let me know if you find any other pointers.
Thanks,
Walid