Refused connection when cryosparc is running

Cryosparc encountered a failure after some time, displaying a ‘localhost refused connection’ error. I believe my internet connection is stable. Occasionally, it runs smoothly for a few days, but other times, the issue occurs approximately every a couple of hours.

single workstation
Current cryoSPARC version: v4.4.0
Linux cryo 6.2.0-37-generic #38~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 2 18:01:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
total used free shared buff/cache available
Mem: 503 35 81 14 385 449
Swap: 1 0 1

CRYOSPARC_PATH=/home/cryosparc/cryosparc4.2.2/cryosparc_worker/bin
XRDP_SOCKET_PATH=/run/xrdp/sockdir
PYTHONPATH=/home/cryosparc/cryosparc4.2.2/cryosparc_worker
CRYOSPARC_SSD_PATH=/mnt/zhitai
CRYOSPARC_CUDA_PATH=/usr/local/cuda
NUMBA_CUDA_INCLUDE_PATH=/home/cryosparc/cryosparc4.2.2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include
LD_LIBRARY_PATH=
PATH=/home/cryosparc/cryosparc4.2.2/cryosparc_worker/bin:/home/cryosparc/cryosparc4.2.2/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/cryosparc/cryosparc4.2.2/cryosparc_worker/deps/anaconda/condabin:/home/cryosparc/anaconda3/bin:/home/cryosparc/anaconda3/condabin:/home/cryosparc/cryosparc4.2.2/cryosparc_master/bin:/home/cryosparc/cryosparc4.2.2/cryosparc_master/bin:/usr/local/relion/build/bin:/usr/local/IMOD/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/IMOD/pythonLink

CryoSPARC System master node installed at
/home/cryosparc/cryosparc4.2.2/cryosparc_master
Current cryoSPARC version: v4.4.0

CryoSPARC process status:

app RUNNING pid 201153, uptime 0:16:16
app_api RUNNING pid 201176, uptime 0:16:14
app_api_dev STOPPED Not started
command_core RUNNING pid 201050, uptime 0:16:29
command_rtp RUNNING pid 201110, uptime 0:16:21
command_vis RUNNING pid 201086, uptime 0:16:22
database RUNNING pid 200936, uptime 0:16:33


License is valid

global config variables:
export CRYOSPARC_LICENSE_ID=" "
export CRYOSPARC_MASTER_HOSTNAME=“cryo”
export CRYOSPARC_DB_PATH=“/home/cryosparc/cryosparc4.2.2/cryosparc_database”
export CRYOSPARC_BASE_PORT=61000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
export CRYOSPARC_PROJECT_DIR_PREFIX=‘CS-’
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_CLICK_WRAP=true

Welcome to the forum @cornpeasant.

Please can you

  1. describe where and during which activity you encounter the errors
  2. how you connect to CryoSPARC:
    • the full url you type in the browser
    • any ssh or vpn tunnel?
    • is your browser running on the single workstation or on a separate computer?
  3. outputs of these commands on the CryoSPARC workstation
    host cryo
    host localhost
    
  4. a screenshot of the browser window when the error occurs
  1. describe where and during which activity you encounter the errors
    It happens for any activity that takes long time such as 2-D classification, template picking, heterogenous refinement

  2. how you connect to CryoSPARC:

  • the full url you type in the browser

  • any ssh or vpn tunnel?
    No ssh or vpn

  • is your browser running on the single workstation or on a separate computer?
    The browser is running on the same ubuntu workstation, and accessed by XRDP remote desktop.

  1. outputs of these commands on the CryoSPARC workstation
host cryo
host localhost

  1. a screenshot of the browser window when the error occurs

This is the screenshot of refreshed browser.

Thanks @cornpeasant for posting this information.
You may want to go through the multistep procedure (simple CryoSPARC stop, discovery and confirmed termination of stale processes and then, and only then, deletion of stale socket files, details) of a complete CryoSPARC shutdown.
After the complete CryoSPARC shutdown, please attempt
cryosparcm start.
If you encounter any problems, please post

  1. error messages
  2. outputs of the commands
    ps -weopid,ppid,start,cmd | grep -e cryosparc -e mongo | grep -v grep
    ls -l /tmp/cryosparc*.sock /tmp/mongodb-*.sock
    free -g
    
  3. details about potential concurrent compute workloads on the server that may cause the system to run out of available RAM.

Hi Wtempel,

Thank you for the help.
Instead of upgrading Cryosparc 4.4.0 from the old version, I made a fresh installation, but the same problem still happened after cryosparc ran for ~28 hours (I tried twice).

Here is the output of your instructed procedure.

$cryosparcm stop
CryoSPARC is running.
Stopping cryoSPARC
unix:///tmp/cryosparc-supervisor-5fccf1c670aab55f9d50ce55f18e4c54.sock refused connection

$ ps -w -U user1 -opid,ppid,start,cmd | grep -e cryosparc -e mongo | grep -v grep

NO OUTPUT.

THEN I DELETED THE SOCK FILE:

$ rm /tmp/cryosparc-supervisor-5fccf1c670aab55f9d50ce55f18e4c54.sock

$ cryosparcm start
Starting cryoSPARC System master process..
CryoSPARC is not already running.
configuring database
    configuration complete
database: started
checkdb success
command_core: started
    command_core connection succeeded
    command_core startup successful
command_vis: started
command_rtp: started
    command_rtp connection succeeded
    command_rtp startup successful
app: started
app_api: started
-----------------------------------------------------

CryoSPARC master started.
 From this machine, access CryoSPARC and CryoSPARC Live at
    http://localhost:61000

 From other machines on the network, access CryoSPARC and CryoSPARC Live at
    http://cryo:61000


Startup can take several minutes. Point your browser to the address
and refresh until you see the cryoSPARC web interface.



ps -weopid,ppid,start,cmd | grep -e cryosparc -e mongo | grep -v grep



$ ps -weopid,ppid,start,cmd | grep -e cryosparc -e mongo | grep -v grep
  82204    2765 12:58:51 python /home/jz/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/bin/supervisord -c /home/jz/cryosparc/cryosparc_master/supervisord.conf
  82319   82204 12:58:57 mongod --auth --dbpath /home/jz/cryosparc/cryosparc_database --port 61001 --oplogSize 64 --replSet meteor --wiredTigerCacheSizeGB 4 --bind_ip_all
  82430   82204 12:59:01 python -c import cryosparc_command.command_core as serv; serv.start(port=61002)
  82468   82204 12:59:08 python -c import cryosparc_command.command_vis as serv; serv.start(port=61003)
  82492   82204 12:59:09 python -c import cryosparc_command.command_rtp as serv; serv.start(port=61005)
  82556   82204 12:59:14 /home/jz/cryosparc/cryosparc_master/cryosparc_app/nodejs/bin/node ./bundle/main.js
    82589   82430 12:59:18 bash /home/jz/cryosparc/cryosparc_worker/bin/cryosparcw run --project P1 --job J33 --master_hostname cryo --master_command_core_port 61002
  82604   82589 12:59:18 python -c import cryosparc_compute.run as run; run.run() --project P1 --job J33 --master_hostname cryo --master_command_core_port 61002
  82606   82604 12:59:18 python -c import cryosparc_compute.run as run; run.run() --project P1 --job J33 --master_hostname cryo --master_command_core_port 61002
  82609   82430 12:59:20 bash /home/jz/cryosparc/cryosparc_worker/bin/cryosparcw run --project P1 --job J34 --master_hostname cryo --master_command_core_port 61002
  82624   82609 12:59:20 python -c import cryosparc_compute.run as run; run.run() --project P1 --job J34 --master_hostname cryo --master_command_core_port 61002
  82626   82624 12:59:20 python -c import cryosparc_compute.run as run; run.run() --project P1 --job J34 --master_hostname cryo --master_command_core_port 61002


$ ls -l /tmp/cryosparc*.sock /tmp/mongodb-*.sock
srwx------ 1 jz jz 0 12月  6 12:58 /tmp/cryosparc-supervisor-5fccf1c670aab55f9d50ce55f18e4c54.sock
srwx------ 1 jz jz 0 12月  6 12:58 /tmp/mongodb-61001.sock


$ free -g
               total        used        free      shared  buff/cache   available
Mem:             503          13          41           0         448         485
Swap:              1           0           1

I only run cryosparc on the workstation, so there should be enough RAM.

I checked the command_core log file and there are some errors:

2023-12-06 13:18:27,427 run                  ERROR    | Encountered exception while running background task
2023-12-06 13:18:27,427 run                  ERROR    | Traceback (most recent call last):
2023-12-06 13:18:27,427 run                  ERROR    |   File "cryosparc_master/cryosparc_command/core.py", line 1115, in cryosparc_master.cryosparc_command.core.background_tasks_worker
2023-12-06 13:18:27,427 run                  ERROR    |   File "/home/jz/cryosparc/cryosparc_master/cryosparc_command/commandcommon.py", line 186, in wrapper
2023-12-06 13:18:27,427 run                  ERROR    |     return func(*args, **kwargs)
2023-12-06 13:18:27,427 run                  ERROR    |   File "/home/jz/cryosparc/cryosparc_master/cryosparc_command/commandcommon.py", line 232, in wrapper
2023-12-06 13:18:27,427 run                  ERROR    |     return func(*args, **kwargs)
2023-12-06 13:18:27,427 run                  ERROR    |   File "/home/jz/cryosparc/cryosparc_master/cryosparc_command/command_core/__init__.py", line 3924, in dump_job_database
2023-12-06 13:18:27,427 run                  ERROR    |     rc.dump_job_database(project_uid = project_uid, job_uid = job_uid, job_completed = job_completed, migration = migration, abs_export_dir = abs_export_dir, logger = logger)
2023-12-06 13:18:27,427 run                  ERROR    |   File "/home/jz/cryosparc/cryosparc_master/cryosparc_compute/jobs/runcommon.py", line 444, in dump_job_database
2023-12-06 13:18:27,427 run                  ERROR    |     file_object = gridfs.get(objectid.ObjectId(object_id)).read()
2023-12-06 13:18:27,427 run                  ERROR    |   File "/home/jz/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/gridfs/__init__.py", line 153, in get
2023-12-06 13:18:27,427 run                  ERROR    |     gout._ensure_file()
2023-12-06 13:18:27,427 run                  ERROR    |   File "/home/jz/cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/site-packages/gridfs/grid_file.py", line 484, in _ensure_file
2023-12-06 13:18:27,427 run                  ERROR    |     raise NoFile(
2023-12-06 13:18:27,427 run                  ERROR    | gridfs.errors.NoFile: no file in gridfs collection Collection(Database(MongoClient(host=['cryo:61001'], document_class=dict, tz_aware=False, connect=False, authsource='admin'), 'meteor'), 'fs.files') with _id ObjectId('656ffffcfe82afe930c2f357')

Thanks @cornpeasant for this info.

This command would have displayed processes owned by (hypothetical) Linux user user1, and needs to be modified to match your circumstances (details).

Please can you similarly check the database log for errors.
(cryosparcm log database)

@cornpeasant Please can you confirm that certain ports based on the configured CRYOSPARC_MASTER_HOSTNAME and CRYOSPARC_BASE_PORT are accessible.
For example, what is the output of the command

curl cryo:61006

?

Thanks @cornpeasant. I cannot spot in the screenshot a reason for the UI failure. The next step would be an analysis of browser logs.
With CryoSPARC running, please

  1. enable browser debugging
  2. re-load the UI at http://localhost:61000
  3. email us the the HAR network output

Hi wtempel,
I found that the issue was very likely due to high cache/buffer usage and extremely low available memory on the Linux system. I now clear the cache/buffer every hour using a script, and cryoSPARC can run smoothly so far.

1 Like

Hi! May I ask if you are willing to share the script you use for clearing the cache/buffer? I am also experiencing a “refused connection” issue; it happened when I was transferring particles into SSD cache. Thank you!

what effect does ssh and vpn tunnel could possibly have?
I never used to have this problem. We recently installed Synology NAS and I have been facing this issue since then constantly. Could there be a connection between the two?

Welcome to the forum @Ana .
Please can you post the outputs of the following commands:

cryosparcm env | grep -e HOSTNAME -e PORT
hostname -f
host $(hostname -f)
cat /etc/hosts

and confirm the error messages you observed and the commands that triggered them.

I have not seen any errors. This are the outputs

cryosparc_user@sn4622119118:~$ cryosparcm env | grep -e HOSTNAME -e PORT
export "CRYOSPARC_MASTER_HOSTNAME=sn4622119118"
export "CRYOSPARC_COMMAND_VIS_PORT=39003"
export "CRYOSPARC_COMMAND_RTP_PORT=39005"
export "CRYOSPARC_HTTP_APP_PORT=39000"
export "CRYOSPARC_HOSTNAME_CHECK=sn4622119118"
export "CRYOSPARC_MONGO_PORT=39001"
export "CRYOSPARC_HTTP_LIVEAPP_LEGACY_PORT=39006"
export "CRYOSPARC_COMMAND_CORE_PORT=39002"
export "CRYOSPARC_BASE_PORT=39000"
export "CRYOSPARC_FORCE_HOSTNAME=false"
cryosparc_user@sn4622119118:~$ hostname -f
sn4622119118
cryosparc_user@sn4622119118:~$ host $(hostname -f)
sn4622119118 has address xx
sn4622119118 has address xx
sn4622119118 has IPv6 address xx
cryosparc_user@sn4622119118:~$ cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 sn4622119394

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

Please can you provide details (symptoms, screenshots, triggering actions) for “this issue”.

This usually happens when I run either 2D classification or homogeneous refinement job, I haven’t noticed it happening with extraction.
Terminal gets closed and cryosparc gets disconnected, a loading sign appears on the screen and when I refresh the page, it says “unable to connect”.
image
If I try to restart cryoSPARC, I get a message "unix:///tmp/cryosparc-supervisor-206773da3c7c06e952eddaffaea9188d.sock refused connection
".
The only two ways I am able to fix it is either restart the computer or remove the sock file and restart cryosparc. When I log into the cryoSPARC, the job I was running before has an error message that reads:
“Job is unresponsive - no heartbeat received in 180 seconds.”

CryoSPARC master processes might be disrupted due to physical or configured thresholds on RAM usage.

  1. Is this a single workstation (combined master/worker on single host) CryoSPARC instance?
  2. What are the outputs of these commands (the first one requires admin access) on the CryoSPARC master host.
    sudo journalctl | grep -i oom 
    free -h
    nvidia-smi --query-gpu=index,name --format=csv
    cryosparcm log supervisord | tail -n 40
    

Yes, this is a single workstation
Here are the outputs of the commands

Another thing I noticed recently is that sometimes when I try to run interactive jobs this is an error message I get:
“Unable to queue P5 J327: ServerError: enqueue job error - P5 J327 is an interactive job and must be queued on the master node”
This never used to happen