Logs are being flooded

Hi all,

I am seeing some very intense error writing to command_core and command_rtp logs.

It seems there are two issues:

type o2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    | Exporting all workspaces failed.
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 122, in background_worker
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |     export_live_running_sessions_scheduler()
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 544, in export_live_running_sessions_scheduler
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |     livecore.export_live_running_sessions_scheduler(mongo.db, cli)
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_compute/jobs/livecore.py", line 761, in export_live_running_sessions_scheduler
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |     cli.dump_workspaces(project_uid=project_uid)
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_compute/client.py", line 66, in func
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |     + self._format_server_error(res['error'])
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    | AssertionError: Encountered error for method "dump_workspaces" with params {'project_uid': 'P353'}:
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    | ServerError: validation error: lock file for P353 not found at /emcc/users/au325422/cryosparc2_data/cs-live/emfac_au325422_cs_64_20210421_SDK10-4_data/P96/cs.lock
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 194, in wrapper
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |     res = func(*args, **kwargs)
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 246, in wrapper
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    |     lockfile_path_abs), f"validation error: lock file for {project_uid} not found at {lockfile_path_abs}"
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    | AssertionError: validation error: lock file for P353 not found at /emcc/users/au325422/cryosparc2_data/cs-live/emfac_au325422_cs_64_20210421_SDK10-4_data/P96/cs.lock
2022-11-20 16:48:52,368 RTP.BG_WORKER        background_worker    ERROR    | 
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    | RTP Child Monitor Failed
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 114, in background_worker
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     rtp_child_job_monitor()
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 185, in wrapper
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     return func(*args, **kwargs)
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_rtp/__init__.py", line 2811, in rtp_child_job_monitor
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     new_status = cli.get_job_status(session['project_uid'], rtp_worker_juid)
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_compute/client.py", line 66, in func
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     + self._format_server_error(res['error'])
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    | AssertionError: Encountered error for method "get_job_status" with params ('P242', 'J28'):
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    | ServerError: P242 J28 does not exist.
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    | Traceback (most recent call last):
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 194, in wrapper
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     res = func(*args, **kwargs)
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 6865, in get_job_status
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     return get_job(project_uid, job_uid, 'status')['status']
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/commandcommon.py", line 185, in wrapper
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     return func(*args, **kwargs)
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |   File "/opt/bioxray/programs/cryosparc2/cryosparc2_master/cryosparc_command/command_core/__init__.py", line 5521, in get_job
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    |     raise ValueError(f"{project_uid} {job_uid} does not exist.")
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    | ValueError: P242 J28 does not exist.
2022-11-20 16:48:53,452 RTP.BG_WORKER        background_worker    ERROR    | 
r paste code here

Both cases seems to be related to issue where data have been removed on the filesystem and not in database.

For the P353 error, I believe I know what went wrong. The repopulate workspaces script from here:

was wrongly run on a project that was deleted on the filesystem.

For P242 J28 error, I have no idea what happened there.

But in both cases, I do not know how to stop the these error messages.

The errors are written to the log files every second, making a lot of I/O and filling up system hard drive fast. Maybe it is also possible to reduce write out log periods?

Any help would be appreciated, thanks.

Best,
Jesper

I tried deleting the two projects from icli as described here:

As it sounded similar situation.

Unfortunately it did not help. Not entirely at least.
It seems to have stopped the load on command_core.log, but unchanged in command_rtp.log.

Is the a way to stop these rogue background_worker processes?
Restarting cryosparcm does not do the trick and not even the entire server for that matter.

Best,
Jesper

Hi @jelka,

Do you happen to have any running sessions that you can pause in the CryoSPARC Live interface?
If you don’t see any, or run into issues, copy and paste the following block of code into cryosparcm icli:


import datetime
all_running_sessions = db.workspaces.find({'status' : 'running'}, {'project_uid':1, 'session_uid':1})
for session in all_running_sessions:
    try:
        rtp.pause_session(session['project_uid'], session['session_uid'])
    except:
        db.workspaces.update_one({'project_uid' : session['project_uid'], 'session_uid' : session['session_uid']}, { '$set' : {'status' : 'paused'}, '$push' : {'paused_at' : datetime.datetime.utcnow()}})
1 Like

Hi @stephan ,

We have never used that particular CS master for cs-live sessions, but P353 was shown there as running. I could not stop nor delete the sessions from GUI.
Although, your icli scripts did the trick.
Both cases were cleared from log.

Thanks for great support.

Best,
Jesper

2 Likes