Hi @apunjani
We are still having this issue and cannot run more than two local refinement (New) jobs or local refinement (Legacy) jobs with NU turned on, otherwise the system hangs.
I want to track down the bottleneck before purchasing a new workstation.
My particle box is typically 512 pixels, and with more than 2 jobs running, CS spends unusually long time (a few hours) on the Local decomposition step of local refinement (Legacy) or the Local cross validation A/B step of local refinement (New). While at the steps that use GPU, the GPU usage (RTX3090 cards) stays at 0% for most of the time, and it takes a few seconds to show one heartbeat. The CPU memory usage is never close to saturation (we have 512 GB memory).
We can simultaneously run more than 2 simpler less memory-intensive jobs (2D class, etc) without significant slowing down.
We’ve already tried your trick to constantly empty the swap. It did help and we could run 8 local refinement (Legacy) jobs with NU turned off.
Thank you in advance for your suggestions!