Maintenance Mode for Worker Nodes

Hello,

is there a way to prevent individual worker nodes from starting new jobs? Similar to the maintenance mode, but not blocking the whole instance.

I am managing a Cryosparc instance with a few worker nodes and need to do some maintenance work that includes reboots from time to time on the individual workers. It would be nice to do this without shutting down the whole Cryosparc instance.

Welcome to the forum @niklas.
May I suggest:

  1. collect current target information with:
    cryosparcm cli "get_scheduler_targets()" | tee cryosparc_targets_$(date +%s).out
    
  2. Identify the the 'name': of the worker node to be maintained and keep the output for reference during worker re-connection (after maintenance).
  3. remove the target from the CryoSPARC instance
    cryosparcm cli remove_scheduler_target_node('worker_name')" (guide)
  4. Allow jobs running on the “removed” worker to complete. In my testing, a 3-minute 3D classification job completed even though I had “removed” the worker shortly after job start.
  5. After maintenance is complete, re-connect the worker with the appropriate arguments. On the worker, run cryosparcw connect with the appropriate parameters

Hi @wtempel

thanks a lot!
Yes, that would work for me. I did not realize that jobs still complete when a worker is disconnected.

Still, an “official” function for this would be a nice feature, I guess.