Hi
Unfortunately the volume viewer problem is back(?). At least now my instance had again this failure. But it was this time also so that jobs couldn’t be started. But jobs that where started earlier where still running.
I got these from one user:
Traceback (most recent call last):
File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 105, in func
with make_json_request(self, "/api", data=data, stacklevel=4) as request:
File "/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/contextlib.py", line 135, in _enter
return next(self.gen)
File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 226, in make_request
raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc_tools.cryosparc.errors.CommandError: *** (http://cryosparcmaster.xxxx:39002/api, code 500) Timeout Error
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 139, in cryosparc_master.cryosparc_compute.run.main
File "/opt/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 108, in func
raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://xxxxx:39002, code 500) Encounted error from JSONRPC function "dump_job_database" with params {'project_uid': 'P162', 'job_uid': 'J462', 'job_completed': True}
cryosparcm restart command_vis didn’t help.
Only cryosparcm restart command_core brought it back. But killed all running jobs.
Best wishes
Thanks @biocit for this report.
Please can you
- describe observations before “cryosparcm restart command_core brought it back”: what did and did not work? Which observations were similar to the “volume viewer problem”?
- post the outputs of the commands
grep -v LICENSE /path/to/cryosparc_master/config.sh
free -h
- email us the
tgz
file created by the command
cryosparcm snaplogs
.
The users complained:
- volume viewer not available. When you select the map you got only “retry”. In the developer console of the browser i saw a 500 error for the map.
- map download not possible
- webinterface sluggish. Zabbix which monitors the website saw 502 errors and then it came back. Flapping.
- New jobs couldn’t be started: unable to create job: Unknown 504 error
The commands:
total used free shared buff/cache available
Mem: 187Gi 45Gi 17Gi 21Mi 126Gi 142Gi
Swap: 49Gi 415Mi 49Gi
grep -v LICENSE /opt/cryosparc/cryosparc_master/config.sh
# Instance Configuration
export CRYOSPARC_MASTER_HOSTNAME="cryosparcmaster.xxxxx"
export CRYOSPARC_DB_PATH="/disks/cryosparcdb/database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
# Security
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
# Cluster Integration
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
# Project Configuration
export CRYOSPARC_PROJECT_DIR_PREFIX='CS-'
# Development
export CRYOSPARC_DEVELOP=false
# Other
export CRYOSPARC_CLICK_WRAP=true
export CRYOSPARC_MONGO_CACHE_GB=32
export CRYOSPARC_SSD_CACHE_LIFETIME_DAYS=1
email is sent
Best wishes
@biocit Our investigation unfortunately did not identify the underlying cause of this problem. Please can you report here if you experience the issue again?
So far not. It was a bit calmer the last few days and there was a regular reboot last Thursday. I will report back when it happens again.
1 Like