Set worker for CPU only job

Hi,

We are starting to move our workstations to a master/worker config with one of our workstations acting as a master and the rest as workers. This seems very nice and we are able to send jobs to the desired worker by selecting the target GPU.

The only problem that I have noticed is with CPU only jobs, for example when extracting from micrographs if one has a lot of CPUs available this can be faster than with GPUs. It would be nice if it would also be possible to direct these CPU only jobs to the same worker where the micrographs are stored for example.

Thank you for considering this feature request.

Thanks for your feature suggestion @jcoleman.

A workaround in recent releases might be to reassign nodes with special CPU capabilities to their dedicated scheduler lane. The drawback would be that such reassignment would remove a node from its current lane, and thus from its current lane’s pool of resources.

Suppose the “CPU-heavy” worker cpuworker.local is already connected as a worker to the CryoSPARC master server at csmaster.local running on port 61000. Then one could move the worker to its dedicated, new lane cpuworkers with the command (run on cpuworker.local):

/path/to/cryosparc_worker/bin/cryosparcw connect --update \
    --master csmaster.local --port 61000 \
    --worker cpuworker.local --newlane --lane cpuworkers

Please ensure that such an assignment does not leave behind an empty lane (of which the reassigned worker was a member previously.)

Hi @wtempel thanks for taking the feature request! Related to this type of configuration, I am wondering if in cryosparc live we could expose the ability to assign live worker to specific workers like we can when assigning a job? It’s not a big deal but it may help to ensure that the worker who is assigned to the live job has local access to the data rather than having to read it through the network. Thank you!

@jcoleman To ensure a common understanding of your use case, please can you post a screenshot of an example job submission where you use the existing capability? Are you referring to the Run on specific GPU option?

@wtempel that’s right, what I am referring to is the ‘run on specific GPU’ option. Let me know if you need a screenshot and I’ll post when I get to the office.

Not needed given the confirmation

Given the caveat for
Run on specific GPU that the override of the scheduler may result in resource conflicts with other running jobs, would it be helpful and sufficient for your use case if one could select a specific worker host, rather than specific GPU device on a specific worker host?

Yes definitely that would be even better!

In this case you could create a separate lane (which one can select as a Live lane or when queuing non-Live jobs) for each gpu computer. To help me propose suitable commands, please can you post the outputs of these commands on the CryoSPARC master computer:

crosparcm status | grep -e HOSTNAME -e BASE_PORT
cryosparcm cli "get_scheduler_targets()"

Hi @wtempel, that’s not really what I would want to do because sometimes the location of the micrographs could be on a different worker for instance depending on which workstation has available space and computation so that would reduce flexibility, unless I am not understanding correctly. It would be very useful though if we could select specific worker when we set the job and have that sent to the scheduler.

Suppose there are CryoSPARC worker node “targets” csn1 and csn2 that are both currently part of the the default lane. Wouldn’t moving csn1 to a new lane Lane1 and csn2 to a new lane Lane2 (and removing the default lane, if it is empty) provide this kind of control?