4.7.0 - Volume viewer problem is back(?)

Hi

Unfortunately the volume viewer problem is back(?). At least now my instance had again this failure. But it was this time also so that jobs couldn’t be started. But jobs that where started earlier where still running.

I got these from one user:

Traceback (most recent call last):
  File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 105, in func
    with make_json_request(self, "/api", data=data, stacklevel=4) as request:
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/contextlib.py", line 135, in _enter
    return next(self.gen)
  File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 226, in make_request
    raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc_tools.cryosparc.errors.CommandError: *** (http://cryosparcmaster.xxxx:39002/api, code 500) Timeout Error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 139, in cryosparc_master.cryosparc_compute.run.main
  File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 108, in func
    raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://xxxxx:39002, code 500) Encounted error from JSONRPC function "dump_job_database" with params {'project_uid': 'P162', 'job_uid': 'J462', 'job_completed': True}

cryosparcm restart command_vis didn’t help.

Only cryosparcm restart command_core brought it back. But killed all running jobs.

Best wishes

Thanks @biocit for this report.
Please can you

  1. describe observations before “cryosparcm restart command_core brought it back”: what did and did not work? Which observations were similar to the “volume viewer problem”?
  2. post the outputs of the commands
    grep -v LICENSE /path/to/cryosparc_master/config.sh
    free -h
    
  3. email us the tgz file created by the command
    cryosparcm snaplogs.

The users complained:

  • volume viewer not available. When you select the map you got only “retry”. In the developer console of the browser i saw a 500 error for the map.
  • map download not possible
  • webinterface sluggish. Zabbix which monitors the website saw 502 errors and then it came back. Flapping.
  • New jobs couldn’t be started: unable to create job: Unknown 504 error

The commands:

               total        used        free      shared  buff/cache   available
Mem:           187Gi        45Gi        17Gi        21Mi       126Gi       142Gi
Swap:           49Gi       415Mi        49Gi

grep -v LICENSE /opt/cryosparc/cryosparc_master/config.sh

# Instance Configuration
export CRYOSPARC_MASTER_HOSTNAME="cryosparcmaster.xxxxx"
export CRYOSPARC_DB_PATH="/disks/cryosparcdb/database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000

# Security
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true

# Cluster Integration
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000

# Project Configuration
export CRYOSPARC_PROJECT_DIR_PREFIX='CS-'

# Development
export CRYOSPARC_DEVELOP=false

# Other
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_MONGO_CACHE_GB=32
export CRYOSPARC_SSD_CACHE_LIFETIME_DAYS=1

email is sent

Best wishes

Email received. Thanks.