Hello all,
We have encountered an issue where only one GPU [0] is allocated for jobs at cryoSPARC, which is not optimal for Heterogeneous Refinement as it is very memory dependant. We have two GPU (0 and 1) that are identical and both are confirmed to be enabled, following the commands we have found on this page.
So, is there any way of allocating both cards for a single job, considering both are identical, recognized and enabled?
Thank you very much in advance.
Did you define CUDA_VISIBLE_DEVICES? The default cluster submission script (cluster_script.sh) contains the following setup code:
available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z "$available_devs" ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
Hi All, How can I know if the default script above (cluster_script.sh) for defining CUDA_VISIBLE_DEVICES works?
I installed CryoSPARC v 2.13.2 on a GPU node (with 4 GPUs) of a cluster. Requested the allocation of that GPU node and logged into the node, run:
echo $CUDA_VISIBLE_DEVICES
Nothing showed for the echo command. Should I just define CUDA_VISIBLE_DEVICES as following in my cluster_script.sh?
export CUDA_VISIBLE_DEVICES=0,1,2,3
Thanks so much!
Hi @donghuachen, you can definitely hardcode the CUDA_VISIBLE_DEVICES as you wrote, in the template script. The loop in the example tries to figure it out on its own.
Hi All,
If I have not specified CUDA_VISIBLE_DEVICES (echo $CUDA_VISIBLE_DEVICES shows nothing), what will CryoSPARC do?
Currently I have two NU Refinement jobs running at the same time, but both log files showed GPU [0]. Does this mean my two NU Refinements are using the same GPU [0]? Thanks.
I think that does mean that both jobs are sharing GPU [0]. You could double check by running nvidia-smi
while they are running.
Rather than running two jobs, I think what you want is to run a single job with two GPUs. Then cryosparc should assign both of them.