@wtempel sure:
bell@ub22-04:~$ uptime -s
2024-04-09 09:29:54
@carterwheat Unfortunately, I was not able to confirm the (only) hypothesis I had based on your problem description
and the commands’ outputs that you so patiently provided.
The hypothesis went like this:
TERM
signal would have allowed for the cleanup of the
file)The kernel “OOM killer” seemed to me a good candidate for part 2., but there appear to be no supporting log records. Please let us know if you have any additional information that would point to an alternative cause, such as if the CryoSPARC processes are running inside a container or are subject to some cluster workload manager.
@wtempel Thanks for all of your help. I will keep you updated if anything else comes up that may point us in the right direction.
-Carter
Hi all,
I’m facing the same problem. I tried some of these suggestions, but none worked. Attached is the error I’m getting.
I’m running CryoSPARC on a local EXXACT workstation on Linux Rocky 9, if it helps. Any suggestions?
Welcome to the forum @marygh. Please can you post additional information:
grep -e HOST -e PORT /home/cryosparc_user/software/cryosparc/cryosparc_master/config.sh
hostname -f
host sn4622120602
host $(hostname -f)
ls -l /tmp/cryosparc*sock
ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo
last reboot
sudo journalctl | grep -i oom
tail -n 60 /home/cryosparc_user/software/cryosparc/cryosparc_master/run/supervisord.log
@marygh Please can you work with your lab IT support to
After these steps, you may want to
CRYOSPARC_MASTER_HOSTNAME
(inside cryosparc_master/config.sh
) with the newly assigned, permanent full hostnameDoes CryoSPARC startup properly after these steps? If it does not, please post the outputs of these commands as text (instead of a screenshot):
grep -e HOST -e PORT /home/cryosparc_user/software/cryosparc/cryosparc_master/config.sh
hostname -f
host $(hostname -f)
cat /etc/hosts
ls -l /tmp/cryosparc*sock
ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo
If CryoSPARC starts normally, you may have to reconfigure the worker component with the command
cryosparcw connect
and appropriate parameters. To help us suggest appropriate parameters, please post the outputs of the following commands
cryosparcm cli "get_scheduler_targets()"
hostname -f
cat /etc/hosts
Thank you! I just need to mention that this workstation is not connected to the Wi-Fi.
Hi all,
I am observing the same thing as mentioned previously by other users. Briefly, I have a new Dell workstation (4x NVIDIA A4500, 48 core (intel xeon w7), 256 Gb DDR5 Ram, 5T SSD, 118T HDD). I have two cryosparc users currently and I have opted for the ‘single workstation’ installation for each (which had previously worked well in my postdoc lab). However, I am routinely getting the ‘socket refused connection’ error when both users are running jobs. User 1 port = 39000 and user 2 port = 39020. We have intentionally only pushed the system to 50% capacity (GPUs are under 10Gb typically; RAM usually has >100 Gb free at any given moment). User 1 has sudo access and User 2 does not.
I ran this command from the comment thread and got this result:
grep -e HOST -e PORT /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/config.sh
hostname -f
host COSE-EGREENE-LX.clients.ad.sfsu.edu
host $(hostname -f)
ls -l /tmp/cryosparc*sock
ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo
last reboot
tail -n 60 /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/run/supervisord.log
export CRYOSPARC_MASTER_HOSTNAME="COSE-EGREENE-LX.clients.ad.sfsu.edu"
export CRYOSPARC_BASE_PORT=39020
COSE-EGREENE-LX.clients.ad.sfsu.edu
COSE-EGREENE-LX.clients.ad.sfsu.edu has address 130.212.214.209
COSE-EGREENE-LX.clients.ad.sfsu.edu has IPv6 address fe80::6c41:9d2b:e7a9:d5d6
COSE-EGREENE-LX.clients.ad.sfsu.edu has address 130.212.214.209
COSE-EGREENE-LX.clients.ad.sfsu.edu has IPv6 address fe80::6c41:9d2b:e7a9:d5d6
srwx------ 1 921270295@ad.sfsu.edu domain users@ad.sfsu.edu 0 Jul 15 11:42 /tmp/cryosparc-supervisor-714ae7c340d4df77be474f8627fd6c9c.sock
1120743 85729 11:42:12 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/supervisord.conf
1120879 1120743 11:42:16 mongod --auth --dbpath /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_database --port 39021 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
1120994 1120743 11:42:20 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39022 cryosparc_command.command_core:start() -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py
1120995 1120994 11:42:20 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn -n command_core -b 0.0.0.0:39022 cryosparc_command.command_core:start() -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py
1121024 1120743 11:42:26 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39023 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py
1121039 1121024 11:42:26 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_vis:app -n command_vis -b 0.0.0.0:39023 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py
1121048 1120743 11:42:27 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39025 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py
1121060 1121048 11:42:27 python /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/gunicorn cryosparc_command.command_rtp:start() -n command_rtp -b 0.0.0.0:39025 -c /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/gunicorn.conf.py
1121094 1120743 11:42:31 /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
1122908 1120995 11:51:01 bash /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_worker/bin/cryosparcw run --project P2 --job J29 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022
1122918 1122908 11:51:01 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J29 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022
1122921 1122918 11:51:01 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J29 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022
1122945 1120995 11:51:04 bash /home/921270295@ad.sfsu.edu/.local/share/cryosparc/cryosparc_worker/bin/cryosparcw run --project P2 --job J30 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022
1122955 1122945 11:51:04 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J30 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022
1122957 1122955 11:51:04 python -c import cryosparc_compute.run as run; run.run() --project P2 --job J30 --master_hostname COSE-EGREENE-LX.clients.ad.sfsu.edu --master_command_core_port 39022
1125517 1124882 12:08:52 grep --color=auto -e cryosparc_ -e mongo
reboot system boot 6.5.0-41-generic Tue Jul 2 12:04 still running
reboot system boot 6.5.0-41-generic Mon Jul 1 13:38 still running
reboot system boot 6.5.0-41-generic Thu Jun 27 11:18 still running
reboot system boot 6.5.0-35-generic Mon Jun 10 10:32 still running
reboot system boot 6.5.0-35-generic Wed Jun 5 14:51 still running
reboot system boot 6.5.0-35-generic Wed Jun 5 14:22 - 14:31 (00:08)
reboot system boot 6.5.0-35-generic Wed Jun 5 09:52 - 14:31 (04:38)
reboot system boot 6.5.0-35-generic Mon Jun 3 18:00 - 14:31 (1+20:30)
reboot system boot 6.5.0-35-generic Mon Jun 3 14:06 - 17:54 (03:47)
reboot system boot 6.5.0-28-generic Wed May 1 13:29 - 17:54 (33+04:24)
reboot system boot 6.5.0-28-generic Thu Apr 25 08:53 - 13:27 (6+04:33)
reboot system boot 6.5.0-28-generic Wed Apr 24 09:22 - 08:49 (23:27)
reboot system boot 6.5.0-27-generic Tue Apr 16 11:27 - 09:09 (7+21:42)
reboot system boot 6.5.0-26-generic Wed Apr 3 12:53 - 08:47 (7+19:54)
reboot system boot 6.5.0-26-generic Wed Mar 27 17:40 - 12:40 (6+18:59)
reboot system boot 6.5.0-26-generic Thu Mar 21 10:21 - 14:46 (6+04:24)
reboot system boot 6.5.0-26-generic Wed Mar 20 14:58 - 15:05 (00:06)
reboot system boot 6.5.0-26-generic Wed Mar 20 13:52 - 13:54 (00:01)
reboot system boot 6.5.0-26-generic Wed Mar 20 13:05 - 13:50 (00:45)
reboot system boot 6.5.0-26-generic Wed Mar 20 12:59 - 13:04 (00:04)
reboot system boot 6.5.0-26-generic Wed Mar 20 12:36 - 12:58 (00:22)
reboot system boot 6.5.0-26-generic Tue Mar 19 15:19 - 12:35 (21:15)
wtmp begins Tue Mar 19 15:19:43 2024
2024-07-10 17:30:36,938 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-10 17:30:36,941 INFO daemonizing the supervisord process
2024-07-10 17:30:36,942 INFO supervisord started with pid 710428
2024-07-10 17:30:41,479 INFO spawned: 'database' with pid 710536
2024-07-10 17:30:42,674 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-10 17:30:44,904 INFO spawned: 'command_core' with pid 710644
2024-07-10 17:30:50,557 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-07-10 17:30:51,078 INFO spawned: 'command_vis' with pid 710677
2024-07-10 17:30:52,799 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-10 17:30:52,952 INFO spawned: 'command_rtp' with pid 710706
2024-07-10 17:30:54,609 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-10 17:30:55,969 INFO spawned: 'app' with pid 710720
2024-07-10 17:30:57,436 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-10 17:30:57,553 INFO spawned: 'app_api' with pid 710740
2024-07-10 17:30:59,356 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-12 13:56:34,780 INFO waiting for app to stop
2024-07-12 13:56:34,780 INFO waiting for app_api to stop
2024-07-12 13:56:34,780 INFO waiting for command_core to stop
2024-07-12 13:56:34,780 INFO waiting for command_rtp to stop
2024-07-12 13:56:34,780 INFO waiting for command_vis to stop
2024-07-12 13:56:34,780 INFO waiting for database to stop
2024-07-12 13:56:34,799 WARN stopped: app (terminated by SIGTERM)
2024-07-12 13:56:34,802 WARN stopped: app_api (terminated by SIGTERM)
2024-07-12 13:56:35,132 INFO stopped: database (exit status 0)
2024-07-12 13:56:35,486 INFO stopped: command_vis (exit status 0)
2024-07-12 13:56:35,487 INFO stopped: command_rtp (exit status 0)
2024-07-12 13:56:37,086 INFO waiting for command_core to stop
2024-07-12 13:56:37,186 INFO stopped: command_core (exit status 0)
2024-07-12 14:29:56,282 INFO RPC interface 'supervisor' initialized
2024-07-12 14:29:56,282 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-12 14:29:56,284 INFO daemonizing the supervisord process
2024-07-12 14:29:56,285 INFO supervisord started with pid 875815
2024-07-12 14:30:00,554 INFO spawned: 'database' with pid 875922
2024-07-12 14:30:02,090 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-12 14:30:03,728 INFO spawned: 'command_core' with pid 876030
2024-07-12 14:30:09,326 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-07-12 14:30:09,784 INFO spawned: 'command_vis' with pid 876100
2024-07-12 14:30:11,531 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-12 14:30:11,653 INFO spawned: 'command_rtp' with pid 876131
2024-07-12 14:30:13,274 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-12 14:30:14,728 INFO spawned: 'app' with pid 876183
2024-07-12 14:30:16,233 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-12 14:30:16,365 INFO spawned: 'app_api' with pid 876204
2024-07-12 14:30:17,367 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-15 11:42:12,511 INFO RPC interface 'supervisor' initialized
2024-07-15 11:42:12,511 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-15 11:42:12,513 INFO daemonizing the supervisord process
2024-07-15 11:42:12,514 INFO supervisord started with pid 1120743
2024-07-15 11:42:16,712 INFO spawned: 'database' with pid 1120879
2024-07-15 11:42:18,319 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-15 11:42:20,410 INFO spawned: 'command_core' with pid 1120994
2024-07-15 11:42:26,033 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-07-15 11:42:26,446 INFO spawned: 'command_vis' with pid 1121024
2024-07-15 11:42:27,446 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-15 11:42:27,571 INFO spawned: 'command_rtp' with pid 1121048
2024-07-15 11:42:29,116 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-15 11:42:30,185 INFO spawned: 'app' with pid 1121073
2024-07-15 11:42:31,660 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-15 11:42:31,767 INFO spawned: 'app_api' with pid 1121094
2024-07-15 11:42:33,727 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
We get the socket refusal after running homogeneous refinement jobs which fail at different times. We are now trying to just get one dataset analyzed at a time but, of course, would like to increase throughput. Any advice is greatly appreciated! Thank you!!
Is it connected to a wired network?
@egreene May I ask
No, it doesn’t have a network connection
hi @wtempel thanks for your response!
(base) 918852355@ad.sfsu.edu@COSE-EGREENE-LX:~/.local/share/cryosparc/cryosparc_master/bin$ ./cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/home/918852355@ad.sfsu.edu/.local/share/cryosparc/cryosparc_master
Current cryoSPARC version: v4.5.3
----------------------------------------------------------------------------
CryoSPARC process status:
unix:///tmp/cryosparc-supervisor-2335236ec86699a1e337c668a9c3542b.sock refused connection
----------------------------------------------------------------------------
An error ocurred while checking license status
Could not get license verification status. Are all CryoSPARC processes RUNNING?
cryosparcm status
). When I rm the .sock file and restart cryosparc (successfully each time; ~6 times in total over the last two weeks) I see that my homogeneous refinement jobs had failed long before the restart.Thanks again so much for your help!
Eric
The workstation is now connected to WiFi and I ran the commands. Below is the output I got.
Please advise. Thanks.
(base) [cryosparc_user@sn4622120602 ~]$ grep -e HOST -e PORT /home/cryosparc_user/software/cryosparc/cryosparc_master/config.sh
hostname -f
host sn4622120602
host $(hostname -f)
ls -l /tmp/cryosparc*sock
ps -eo pid,ppid,start,command | grep -e cryosparc_ -e mongo
last reboot
sudo journalctl | grep i oom
tail -n 60 /home/cryosparc_user/software/cryosparc/cryosparc_master/run/supervisord.log
export CRYOSPARC_MASTER_HOSTNAME="sn4622120602"
export CRYOSPARC_BASE_PORT=39000
sn4622120602
Host sn4622120602 not found: 3(NXDOMAIN)
Host sn4622120602 not found: 3(NXDOMAIN)
srwx------. 1 cryosparc_user cryosparc_user 0 Jul 9 12:08 /tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock
4648 3834 15:54:50 grep --color=auto -e cryosparc_ -e mongo
reboot system boot 5.14.0-427.16.1. Fri Jul 19 15:42 still running
reboot system boot 5.14.0-427.16.1. Thu Jul 18 14:41 - 14:30 (23:49)
reboot system boot 5.14.0-427.16.1. Thu Jul 18 10:34 - 14:16 (03:41)
reboot system boot 5.14.0-427.16.1. Wed Jul 17 14:29 - 16:55 (02:25)
reboot system boot 5.14.0-427.16.1. Wed Jul 17 14:12 - 14:20 (00:07)
reboot system boot 5.14.0-427.16.1. Mon Jul 15 08:47 - 16:42 (1+07:55)
reboot system boot 5.14.0-427.16.1. Thu Jul 11 14:14 - 15:36 (01:21)
reboot system boot 5.14.0-427.16.1. Thu Jul 11 12:23 - 13:26 (01:03)
reboot system boot 5.14.0-427.16.1. Thu Jul 11 10:57 - 12:20 (01:23)
reboot system boot 5.14.0-427.16.1. Wed Jul 10 15:19 - 16:35 (01:15)
reboot system boot 5.14.0-427.16.1. Tue Jul 9 11:58 - 16:15 (04:17)
reboot system boot 5.14.0-427.16.1. Tue Jul 9 11:40 - 11:49 (00:09)
reboot system boot 5.14.0-427.16.1. Mon Jul 8 14:48 - 11:49 (21:01)
reboot system boot 5.14.0-427.16.1. Thu May 23 17:50 - 18:06 (00:16)
reboot system boot 5.14.0-427.16.1. Tue May 21 17:02 - 17:35 (2+00:33)
reboot system boot 5.14.0-427.16.1. Mon May 20 11:06 - 15:04 (03:58)
reboot system boot 5.14.0-427.16.1. Fri May 17 14:23 - 11:04 (2+20:41)
reboot system boot 5.14.0-427.16.1. Fri May 17 14:14 - 14:16 (00:01)
reboot system boot 5.14.0-427.16.1. Fri May 17 14:05 - 14:16 (00:10)
reboot system boot 5.14.0-427.16.1. Fri May 17 14:01 - 14:04 (00:02)
reboot system boot 5.14.0-427.16.1. Fri May 17 20:26 - 13:59 (-6:27)
reboot system boot 5.14.0-362.8.1.e Fri May 17 12:05 - 13:24 (01:18)
wtmp begins Fri May 17 12:05:57 2024
grep: oom: No such file or directory
[sudo] password for cryosparc_user:
cryosparc_user is not in the sudoers file. This incident will be reported.
2024-05-22 13:06:25,814 INFO spawned: 'command_rtp' with pid 8442
2024-05-22 13:06:27,369 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:06:28,673 INFO spawned: 'app' with pid 8463
2024-05-22 13:06:30,117 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:06:30,221 INFO spawned: 'app_api' with pid 8481
2024-05-22 13:06:32,063 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:12,701 INFO waiting for app to stop
2024-05-22 13:10:12,701 INFO waiting for app_api to stop
2024-05-22 13:10:12,701 INFO waiting for command_core to stop
2024-05-22 13:10:12,701 INFO waiting for command_rtp to stop
2024-05-22 13:10:12,701 INFO waiting for command_vis to stop
2024-05-22 13:10:12,701 INFO waiting for database to stop
2024-05-22 13:10:12,708 WARN stopped: app (terminated by SIGTERM)
2024-05-22 13:10:12,708 WARN stopped: app_api (terminated by SIGTERM)
2024-05-22 13:10:13,275 INFO stopped: command_vis (exit status 0)
2024-05-22 13:10:13,310 INFO stopped: command_rtp (exit status 0)
2024-05-22 13:10:13,423 INFO stopped: database (exit status 0)
2024-05-22 13:10:14,910 INFO waiting for command_core to stop
2024-05-22 13:10:14,985 INFO stopped: command_core (exit status 0)
2024-05-22 13:10:15,916 INFO RPC interface 'supervisor' initialized
2024-05-22 13:10:15,916 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-05-22 13:10:15,917 INFO daemonizing the supervisord process
2024-05-22 13:10:15,918 INFO supervisord started with pid 9082
2024-05-22 13:10:19,984 INFO spawned: 'database' with pid 9189
2024-05-22 13:10:21,289 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:23,378 INFO spawned: 'command_core' with pid 9293
2024-05-22 13:10:28,940 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-05-22 13:10:29,359 INFO spawned: 'command_vis' with pid 9321
2024-05-22 13:10:30,957 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:31,060 INFO spawned: 'command_rtp' with pid 9361
2024-05-22 13:10:32,580 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:33,843 INFO spawned: 'app' with pid 9375
2024-05-22 13:10:35,279 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-22 13:10:35,383 INFO spawned: 'app_api' with pid 9392
2024-05-22 13:10:37,194 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-05-23 17:35:53,792 WARN received SIGTERM indicating exit request
2024-05-23 17:35:53,793 INFO waiting for app, app_api, command_core, command_rtp, command_vis, database to die
2024-05-23 17:35:53,793 WARN ignored SIGHUP indicating restart request (shutdown in progress)
2024-05-23 17:35:53,801 WARN exited: app (terminated by SIGTERM; not expected)
2024-05-23 17:35:53,802 WARN exited: app_api (terminated by SIGTERM; not expected)
2024-05-23 17:35:54,010 INFO exited: command_rtp (exit status 0; expected)
2024-05-23 17:35:54,025 INFO exited: command_core (exit status 0; expected)
2024-05-23 17:35:54,459 INFO exited: command_vis (exit status 0; expected)
2024-05-23 17:35:54,624 INFO stopped: database (exit status 0)
2024-07-09 12:08:37,338 INFO RPC interface 'supervisor' initialized
2024-07-09 12:08:37,339 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2024-07-09 12:08:37,340 INFO daemonizing the supervisord process
2024-07-09 12:08:37,340 INFO supervisord started with pid 4738
2024-07-09 12:08:41,480 INFO spawned: 'database' with pid 4845
2024-07-09 12:08:42,825 INFO success: database entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:44,895 INFO spawned: 'command_core' with pid 4949
2024-07-09 12:08:50,593 INFO success: command_core entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
2024-07-09 12:08:51,006 INFO spawned: 'command_vis' with pid 4977
2024-07-09 12:08:52,811 INFO success: command_vis entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:52,918 INFO spawned: 'command_rtp' with pid 5018
2024-07-09 12:08:54,448 INFO success: command_rtp entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:55,682 INFO spawned: 'app' with pid 5032
2024-07-09 12:08:57,187 INFO success: app entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2024-07-09 12:08:57,293 INFO spawned: 'app_api' with pid 5049
2024-07-09 12:08:59,119 INFO success: app_api entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
@egreene Please can you nevertheless check
sudo journalctl | grep -i oom
When you encounter this error, please can you, before deleting or otherwise manipulating the sock
file(s), collect and post the full file name of the *.sock
file that refused the connection and output of the commands
ls -l /tmp/cryosparc-supervisor-*.sock
ps -eo user,pid,ppid,command | grep -e cryosparc_ -e mongo
Please can you post the outputs of these commands for some of the failed refinement jobs
cryosparcm cli "get_job('P99', 'J199', 'job_type', 'status', 'heartbeat_at', 'instance_information')"
cryosparcm eventlog P99 J199 | tail -n 10
cryosparcm joblog P99 J199 | tail -n10
where you replace P99
, J199
with the project and job IDs of a few failed refinement jobs
It seems that this file is older than the most recent reboots of the computer, suggesting “unclean” shutdowns of the computer. You may want to ensure that
CryoSPARC is stopped before any reboot of the computer.
If the computer reboots unexpectedly, you may need to remove
/tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock
manually after the computer reboot, but before restarting CryoSPARC.
Always confirm that no CryoSPARC processes are running before deleting the
cryosparc-supervisor-*.sock
file (suggestions).
Does cryosparcm start
work after that?
It is working! Thanks very much.
Hi there! I am having the same issue with the sock file. My computer has been crashing randomly, but cryosparc is left running. Here is what I’ve been doing and the errors I have received. cryosparc won’t let me shut it down after the computer reboots due to the refused connection.
CryoSPARC process status:
unix:///tmp/cryosparc-supervisor-30cc4604421e57a31bbc937a97fa69b6.sock refused connection
An error ocurred while checking license status
Could not get license verification status. Are all CryoSPARC processes RUNNING?
cryosparc@Hera:~$ cryosparcm stop
CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-30cc4604421e57a31bbc937a97fa69b6.sock refused connection
Welcome to the forum @anelise .
You may want to ask your IT support to determine the cause of the random crashes.
If you encounter
when the computer has not crashed, you may want to look at topic XX.sock refused connection error.
For recovery from an incomplete or unclean CryoSPARC section, please refer to the CryoSPARC guide.