App EXITED unknown reason

Hi, for some reason the app service of this instance keeps crashing and I’m not sure what the reason is. here’s what I see in the different log files:

run/app.log

2024-01-08 14:29:07,289 ERROR | uncaughtException: Unexpected token < in JSON at position 0
2024-01-08 14:29:07,289 ERROR | SyntaxError: Unexpected token < in JSON at position 0
2024-01-08 14:29:07,289 ERROR |     at JSON.parse (<anonymous>)
2024-01-08 14:29:07,289 ERROR |     at IncomingMessage.<anonymous> (/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/cryosparc_app/custom-server/dist/server/index.js:894:47457)
2024-01-08 14:29:07,289 ERROR |     at IncomingMessage.emit (node:events:525:35)
2024-01-08 14:29:07,289 ERROR |     at endReadableNT (node:internal/streams/readable:1358:12)
2024-01-08 14:29:07,289 ERROR |     at processTicksAndRejections (node:internal/process/task_queues:83:21)
2024-01-08 14:29:07,291 ERROR | uncaughtException: Unexpected token < in JSON at position 0
2024-01-08 14:29:07,291 ERROR | SyntaxError: Unexpected token < in JSON at position 0
2024-01-08 14:29:07,291 ERROR |     at JSON.parse (<anonymous>)
2024-01-08 14:29:07,291 ERROR |     at IncomingMessage.<anonymous> (/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/cryosparc_app/custom-server/dist/server/index.js:894:47457)
2024-01-08 14:29:07,291 ERROR |     at IncomingMessage.emit (node:events:525:35)
2024-01-08 14:29:07,291 ERROR |     at endReadableNT (node:internal/streams/readable:1358:12)
2024-01-08 14:29:07,291 ERROR |     at processTicksAndRejections (node:internal/process/task_queues:83:21)

run/app_api.log

cryoSPARC Application API server running
events.js:377
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE: address already in use 0.0.0.0:55006
    at Server.setupListenHandle [as _listen2] (net.js:1331:16)
    at listenInCluster (net.js:1379:12)
    at doListen (net.js:1516:7)
    at processTicksAndRejections (internal/process/task_queues.js:83:21)
    at runNextTicks (internal/process/task_queues.js:64:3)
    at processImmediate (internal/timers.js:437:9)
Emitted 'error' event on Server instance at:
    at emitErrorNT (net.js:1358:8)
    at processTicksAndRejections (internal/process/task_queues.js:82:21)
    at runNextTicks (internal/process/task_queues.js:64:3)
    at processImmediate (internal/timers.js:437:9) {
  code: 'EADDRINUSE',
  errno: -98,
  syscall: 'listen',
  address: '0.0.0.0',
  port: 55006
}
cryoSPARC Application API server running
cryoSPARC Application API server running

It does say something about the address already being in use but I’m quite sure there were no other processes using the port and that I restarted cryosparc cleanly with no zombie processes left over (no multiple occurrences of supervisord or otherwise)

Here are some more errors but I’m not sure they’re relevant:

2024-01-08 16:56:22,739 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |   File "/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/subprocess.py", line 415, in check_output
2024-01-08 16:56:22,739 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2024-01-08 16:56:22,739 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |   File "/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/subprocess.py", line 516, in run
2024-01-08 16:56:22,739 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |     raise CalledProcessError(retcode, process.args,
2024-01-08 16:56:22,739 COMMAND.SCHEDULER    update_cluster_job_status ERROR    | subprocess.CalledProcessError: Command '['bjobs', 'submitted.']' returned non-zero exit status 255.
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    | submitted.: Illegal job ID.
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    | Traceback (most recent call last):
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |   File "/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/cryosparc_command/command_core/__init__.py", line 2711, in update_cluster_job_status
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |     cluster_job_status = cluster.get_cluster_job_status(target, cluster_job_id, template_args)
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |   File "/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/cryosparc_compute/cluster.py", line 175, in get_cluster_job_status
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |     res = subprocess.check_output(shlex.split(cmd), stderr=subprocess.STDOUT).decode()
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |   File "/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/subprocess.py", line 415, in check_output
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |   File "/home/halicgrp_sparc/software/hpc_cryosparc/cryosparc_master/deps/anaconda/envs/cryosparc_master_env/lib/python3.8/subprocess.py", line 516, in run
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    |     raise CalledProcessError(retcode, process.args,
2024-01-08 16:56:22,740 COMMAND.SCHEDULER    update_cluster_job_status ERROR    | subprocess.CalledProcessError: Command '['bjobs', 'submitted.']' returned non-zero exit status 255.

@shockacone You may want to initially investigate two failure scenarios:
Scenario 1. “Zombie” processes have “escaped” the basic CryoSPARC shutdown procedure and are interfering with new CryoSPARC start-up attempts. The remedy would be a complete and confirmed shutdown of CryoSPARC, followed by CryoSPARC startup.
Scenario 2. Port 55006 is within the “ephemeral” port range and has been “randomly” assigned to another process. To display the ephemeral range:
/sbin/sysctl net.ipv4.ip_local_port_range
To display processes that may be using the port:
ss -ap | grep 55006
If you find that scenario 2. applies, you may want to consider

  1. a complete shutdown of CryoSPARC
  2. identify the beginning of a range of 10 consecutive network ports outside the ephemeral range
  3. cryosparcm changeport (guide)

Does this help?