Continuous peppering of slurmctl with rpcs

We are running cryosparc v4.7.0 and our slurmctl process is being constantly peppered with rpcs for jobs that are no longer available with squeue. The slurmctl cache is only kept for a month due to the high number of jobs on our cluster. Is there a way of manually assigning the state of the job to cryosparc when slurm is no longer able to return information about the jobid?

@DavidHoover Please can you provide additional information on the jobs in question:

  1. What is the fraction of CryoSPARC jobs associated with continuous squeue queries?
  2. Were the jobs indeed queued more than a month ago?
  3. Can you find any useful information in affected jobs’ slurm stdout or stderr files?
  4. Do you have any other information on why slurm rejected or terminated the affected jobs?
  1. It is a small fraction, depending on the user. Perhaps 10 or so jobs.
  2. Some were months ago, but several are from a few days ago. This is because our slurm is configured to only retain jobs for 300 seconds after they end.
  3. I have not looked at the actual output. I believe the issue is that we set CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=120, and under rare conditions the jobs end very quickly (sooner than 120 seconds) or are cancelled before they start.
  4. There are many, many reasons why our users’ jobs end prematurely. I could not identify them all.

All of this is irrelevant. The question is how to tell cryosparc that the jobs are no longer running, and to stop attempting to discover their status. Should the user simply delete the jobs?

@DavidHoover ,

To tell cryosparc that the jobs are no longer running and to stop attempting to discover their status, the user may access the interactive terminal via cryosparcm icli (cryosparcm cli reference | CryoSPARC Guide), after which if they run the following command,

db.jobs.update_one({'uid': 'JY', 'project_uid': 'PX'}, {'$set': {'status': 'failed'}})

will stop cryosparc from peppering slurm.