Bad load distribution for CPU jobs in multi-worker lanes

Hi cryoSPARC-dev team,

as by my previous post, I am currently undergoing extensive reprocessing of data.
To reduce the load of the GPUs, we try to run as much jobs on CPU as possible.
This includes Extract from micrographs, but also the ‘usual’ CPU task (e.g., template creation).

In fact, I used multipe redundant template creation jobs, to have the same projections in each workspace. I just cloned the job multiple times, and moved them to the workspace. Even though, this is highly redundant, this increases the ‘readability’ of the project, as one directly can see the templates within the workspace.

I queued all the template creation jobs in a single lane (with 4 workers with each 64 threads), with 4 CPU threads to use. And here, things get dirty somehow.
All the jobs get queued on the same server within the lane. And it’s always the same server in this lane. CPU jobs get never queued on any other of this lane. This results in an overload of the server. Instead of seconds, I killed the template creation jobs after hours, and other jobs (i.e., extractions) were also super slow.

It’s an edge case, I know, but maybe one can include also and CPU balancing between the workers in a lane, to avoid these cases.
Or at least have some options, to select the server on which this will be executed. Like the select GPU option.

Best
Christian

Thanks for @ctueting for describing this use case, which we will be discussing internally. Am I assuming correctly that, outside these specific re-reprocessing efforts, you generally prefer the multi-host lane over a set of single-host lanes?

This was actually based on a suggestion here in the discussion.

We have subsets of identical servers (8 in total), and since we merged the workers based on theirs specs to distinct lanes, rather than single worker-lane configurations, our output increases, as we have less waiting for GPU times.

edit:
for the lanes, we have 2 identical servers with just 2 GPUs, which went into a small lane. We have 4 servers with each 4 2080Ti and comparable RAM&CPU. And we have 2 very good machines with 4 3090 and 4 A5000, which went in the high end lane. So worker to lane assignment was rational

Hi @ctueting ,

Thanks for the additional information. We’ve added more granular queueing options to our to-do list and hope to incorporate it in an upcoming release.

Regards,
Suhail