Hi all,
I am seeing some very intense error writing to command_core and command_rtp logs.
It seems there are two issues:
type o2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | Exporting all workspaces failed.
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 122, in background_worker
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | export_live_running_sessions_scheduler()
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 544, in export_live_running_sessions_scheduler
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | livecore.export_live_running_sessions_scheduler(mongo.db, cli)
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_compute/jobs/livecore.py", line 761, in export_live_running_sessions_scheduler
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | cli.dump_workspaces(project_uid=project_uid)
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_compute/client.py", line 66, in func
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | + self._format_server_error(res['error'])
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | AssertionError: Encountered error for method "dump_workspaces" with params {'project_uid': 'P353'}:
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | ServerError: validation error: lock file for P353 not found at /emcc/users/au325422/cryosparc2_data/cs-live/emfac_au325422_cs_64_20210421_SDK10-4_data/P96/cs.lock
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 194, in wrapper
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | res = func(*args, **kwargs)
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 246, in wrapper
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | lockfile_path_abs), f"validation error: lock file for {project_uid} not found at {lockfile_path_abs}"
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR | AssertionError: validation error: lock file for P353 not found at /emcc/users/au325422/cryosparc2_data/cs-live/emfac_au325422_cs_64_20210421_SDK10-4_data/P96/cs.lock
2022-11-20 16:48:52,368 RTP.BG_WORKER background_worker ERROR |
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | RTP Child Monitor Failed
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 114, in background_worker
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | rtp_child_job_monitor()
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 185, in wrapper
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | return func(*args, **kwargs)
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 2811, in rtp_child_job_monitor
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | new_status = cli.get_job_status(session['project_uid'], rtp_worker_juid)
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_compute/client.py", line 66, in func
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | + self._format_server_error(res['error'])
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | AssertionError: Encountered error for method "get_job_status" with params ('P242', 'J28'):
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | ServerError: P242 J28 does not exist.
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | Traceback (most recent call last):
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 194, in wrapper
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | res = func(*args, **kwargs)
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 6865, in get_job_status
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | return get_job(project_uid, job_uid, 'status')['status']
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 185, in wrapper
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | return func(*args, **kwargs)
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 5521, in get_job
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | raise ValueError(f"{project_uid} {job_uid} does not exist.")
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR | ValueError: P242 J28 does not exist.
2022-11-20 16:48:53,452 RTP.BG_WORKER background_worker ERROR |
r paste code here
Both cases seems to be related to issue where data have been removed on the filesystem and not in database.
For the P353 error, I believe I know what went wrong. The repopulate workspaces script from here:
was wrongly run on a project that was deleted on the filesystem.
For P242 J28 error, I have no idea what happened there.
But in both cases, I do not know how to stop the these error messages.
The errors are written to the log files every second, making a lot of I/O and filling up system hard drive fast. Maybe it is also possible to reduce write out log periods?
Any help would be appreciated, thanks.
Best,
Jesper