4.7.0 - Volume viewer problem is back(?)

biocit · May 9, 2025, 3:33pm

Hi

Unfortunately the volume viewer problem is back(?). At least now my instance had again this failure. But it was this time also so that jobs couldn’t be started. But jobs that where started earlier where still running.

I got these from one user:

Traceback (most recent call last):
  File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 105, in func
    with make_json_request(self, "/api", data=data, stacklevel=4) as request:
  File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/contextlib.py", line 135, in _enter
    return next(self.gen)
  File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 226, in make_request
    raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc_tools.cryosparc.errors.CommandError: *** (http://cryosparcmaster.xxxx:39002/api, code 500) Timeout Error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 139, in cryosparc_master.cryosparc_compute.run.main
  File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 108, in func
    raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://xxxxx:39002, code 500) Encounted error from JSONRPC function "dump_job_database" with params {'project_uid': 'P162', 'job_uid': 'J462', 'job_completed': True}

cryosparcm restart command_vis didn’t help.

Only cryosparcm restart command_core brought it back. But killed all running jobs.

Best wishes

wtempel · May 9, 2025, 5:52pm

Thanks @biocit for this report.
Please can you

describe observations before “cryosparcm restart command_core brought it back”: what did and did not work? Which observations were similar to the “volume viewer problem”?

post the outputs of the commands

grep -v LICENSE /path/to/cryosparc_master/config.sh
free -h

email us the tgz file created by the command
cryosparcm snaplogs.

biocit · May 9, 2025, 6:50pm

The users complained:

volume viewer not available. When you select the map you got only “retry”. In the developer console of the browser i saw a 500 error for the map.
map download not possible
webinterface sluggish. Zabbix which monitors the website saw 502 errors and then it came back. Flapping.
New jobs couldn’t be started: unable to create job: Unknown 504 error

The commands:

               total        used        free      shared  buff/cache   available
Mem:           187Gi        45Gi        17Gi        21Mi       126Gi       142Gi
Swap:           49Gi       415Mi        49Gi

grep -v LICENSE /opt/cryosparc/cryosparc_master/config.sh

# Instance Configuration
export CRYOSPARC_MASTER_HOSTNAME="cryosparcmaster.xxxxx"
export CRYOSPARC_DB_PATH="/disks/cryosparcdb/database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000

# Security
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true

# Cluster Integration
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000

# Project Configuration
export CRYOSPARC_PROJECT_DIR_PREFIX='CS-'

# Development
export CRYOSPARC_DEVELOP=false

# Other
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_MONGO_CACHE_GB=32
export CRYOSPARC_SSD_CACHE_LIFETIME_DAYS=1

email is sent

Best wishes

wtempel · May 9, 2025, 7:30pm

Email received. Thanks.

wtempel · May 21, 2025, 6:02pm

@biocit Our investigation unfortunately did not identify the underlying cause of this problem. Please can you report here if you experience the issue again?

biocit · May 26, 2025, 9:13am

So far not. It was a bit calmer the last few days and there was a regular reboot last Thursday. I will report back when it happens again.