Following a restart after our storage had to be restarted, we are seeing issues with jobs not appearing correctly in the resource manager. There are two symptoms:
- A few completed jobs are remaining in the list of current jobs, on their relevant lane. When the job is opened, they show as complete.
- New jobs are not appearing at all in the lane on which they are running (I have managed to catch then briefly flashing up when starting). I can see the jobs running from the cli:
In [7]: [(j[‘project_uid’], j[‘uid’], j[‘status’], j[‘job_type’]) for j in cli.get_jobs_by_status({ ‘$in’: [‘queued’, ‘launched’, ‘started’, ‘waiting’, ‘running’]})]
Out[7]:
[(‘P14’, ‘J323’, ‘queued’, ‘ctf_refine_local’),
(‘P3’, ‘J988’, ‘running’, ‘nonuniform_refine_new’),
(‘P9’, ‘J251’, ‘running’, ‘new_local_refine’)]
There is nothing obvious when I check the logfiles. Any advice gratefully received.
Thanks,
Andy Purkiss