2D Classification PyCuda Error (CUDA 11)

Hi Stephan and CleoShen (@stephan and @CleoShen ),

I have the same error report, when I am trying to scale up the use of multiple GPU nodes (with GPU devices number, more than that a single node has). This use case is to submit a job into a shared cluster to be computed by multiple GPU nodes at the same time.

I found out that CUDA_VISIBLE_DEVICES is set for one single node only, and devices across multiple nodes cannot be defined properly to be seen at the same time. This would also be the case when compute iterations require, and it would be difficult to be distributed across multiple nodes.

I am not sure if there are any ways to resolve this. My job scheduler is PBS. Would you also think the issue is exactly as I describe above? Should I request to change PBS settings to use multiple GPU nodes at the same time?

Sincerely,
Qitsweauca

Hi @qitsweauca,

All jobs in cryoSPARC are only python processes; they cannot be split across multiple workstations.
I hope that explains why you’re encountering this error.

1 Like