Hi,
Here is the full output from the GPU job. This should answer clarify things and answer questions.
To answer your final question, yes this command WILL run on standalone TOPAZ, suggesting that something is not okay with the submission script I am using through cryosparc.
Thank you.
====================== Cluster submission script: ========================
==========================================================================
#!/bin/bash
#### cryoSPARC cluster submission script template for SLURM
/cryosparc_v4.0.0_gpu99Master_gpu118Worker_PORT51000_LIC_2dfef1c2/cryosparc_worker/bin/cryosparcw run --project P9 --job J387 --master_hostname gpu99.institute.local --master_command_core_port 51002 > /nfs/science/group/cryosparc_v3.3.2_gpu114Master_gpu118Worker_LIC_2dfef1c2_CUDA_11.0.3/(path)/J387/job.log 2>&1 - the complete command string to run the job
## 8 - the number of CPUs needed
## 1 - the number of GPUs needed.
## Note: the code will use this many GPUs starting from dev id 0
## the cluster scheduler or this script have the responsibility
## of setting CUDA_VISIBLE_DEVICES so that the job code ends up
## using the correct cluster-allocated GPUs.
## 8.0 - the amount of RAM needed in GB
## /nfs/science/group
/cryosparc_v4.0.0_gpu99Master_gpu118Worker_PORT51000_LIC_2dfef1c2/cryosparc_worker/bin/cryosparcw - absolute path to the cryosparc worker command
## --project P9 --job J387 --master_hostname gpu99.institute.local --master_command_core_port 51002 - arguments to be passed to cryosparcw run
## P9 - uid of the project
## J387 - uid of the job
##
## What follows is a simple SLURM script:
#SBATCH --job-name cs_1_P9_J387
#SBATCH -n 8
#SBATCH --gres=gpu:1
#####SBATCH --mem=128000MB
#SBATCH --mem-per-cpu=11G
#SBATCH -o /nfs/science/group/cryosparc_slurm_outputs/output_P9_J387.txt
#SBATCH -e /nfs/science/group/cryosparc_slurm_outputs/error_P9_J387.txt
#Define the "gpu" partition for GPU-accelerated jobs
#SBATCH --partition=gpu
#Define the GPU architecture (GTX980 in the example, other options are GTX1080Ti, K40)
######SBATCH --constraint=GTX1080Ti
#SBATCH --exclude=gpu227,gpu228,gpu138,gpu150,gpu148
######SBATCH --constraint=buster
#SBATCH --time=96:00:00
module load cuda/11.2.2
module load tensorflow
nvidia-smi
available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid --format=csv,noheader) ]] ; then
if [[ -z "$available_devs" ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
#export CUDA_VISIBLE_DEVICES=$available_devs
echo $available_devs
echo $CUDA_HOME
echo "$(hostname)"
echo $SLURM_TMPDIR
/usr/bin/nvidia-smi
module list
export CRYOSPARC_SSD_PATH="${SLURM_TMPDIR}"
/nfs/science/group/cryosparc_v4.0.0_gpu99Master_gpu118Worker_PORT51000_LIC_2dfef1c2/cryosparc_worker/bin/cryosparcw run --project P9 --job J387 --master_hostname gpu99.institute.local --master_command_core_port 51002 > /nfs/(path).job.log 2>&1
==========================================================================
==========================================================================
-------- Submission command:
sbatch /(path)/queue_sub_script.sh
-------- Queued on cluster at XXXXX
Job J387 Started
[CPU: 96.4 MB]
Master running v4.0.0, worker running v4.0.0
[CPU: 96.7 MB]
Working in directory: (path)
[CPU: 96.7 MB]
Running on lane slurmcluster
[CPU: 96.7 MB]
Resources allocated:
[CPU: 96.7 MB]
Worker: slurmcluster
[CPU: 96.7 MB]
CPU : [0, 1, 2, 3, 4, 5, 6, 7]
[CPU: 96.7 MB]
GPU : [0]
[CPU: 96.7 MB]
RAM : [0]
[CPU: 96.7 MB]
SSD : False
[CPU: 96.7 MB]
--------------------------------------------------------------
[CPU: 96.7 MB]
Importing job module for job type topaz_extract...
[CPU: 243.0 MB]
Job ready to run
[CPU: 243.0 MB]
***************************************************************
[CPU: 243.0 MB]
Topaz is a particle detection tool created by Tristan Bepler and Alex J. Noble.
Citations:
- Bepler, T., Morin, A., Rapp, M. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat Methods 16, 1153-1160 (2019) doi:10.1038/s41592-019-0575-8
- Bepler, T., Noble, A.J., Berger, B. Topaz-Denoise: general deep denoising models for cryoEM. bioRxiv 838920 (2019) doi: https://doi.org/10.1101/838920
Structura Biotechnology Inc. and cryoSPARC do not license Topaz nor distribute Topaz binaries. Please ensure you have your own copy of Topaz licensed and installed under the terms of its GNU General Public License v3.0, available for review at: https://github.com/tbepler/topaz/blob/master/LICENSE.
***************************************************************
[CPU: 246.0 MB]
Starting Topaz process using version 0.2.4...
[CPU: 246.0 MB]
Skipping preprocessing.
[CPU: 246.0 MB]
Using preprocessed micrographs from J183/preprocessed
[CPU: 246.2 MB]
--------------------------------------------------------------
[CPU: 246.2 MB]
Inverting negative staining...
[CPU: 246.2 MB]
Inverting negative staining complete.
[CPU: 246.2 MB]
--------------------------------------------------------------
[CPU: 246.2 MB]
Starting extraction...
[CPU: 246.2 MB]
Starting extraction by running command (path)/topaz.sh extract --radius 7 --threshold -6 --up-scale 4 --assignment-radius -1 --min-radius 5 --max-radius 100 --step-radius 5 --num-workers 8 --device 0 --model (path) -o (path) [MICROGRAPH PATHS EXCLUDED FOR LEGIBILITY]
[CPU: 246.2 MB]
Please type
[CPU: 246.2 MB]
source
[CPU: 246.2 MB]
/(path)/anaconda3/2022.05/activate_anaconda3_2022.05.txt
[CPU: 246.2 MB]
**CudaWarning: module 'torch._C' has no attribute '_cuda_setDevice'**
**[CPU: 246.2 MB]**
**Falling back to CPU.**
To add a bit more, here is some of the output from the job info on the cluster
...
...
NodeList=gpu125
BatchHost=gpu125
NumNodes=1 NumCPUs=8 NumTasks=8 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=8,mem=88G,node=1,billing=8,gres/gpu=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
JOB_GRES=gpu:1
Nodes=gpu125 CPU_IDs=0-7 Mem=90112 GRES=gpu:1(IDX:0)
MinCPUsNode=1 MinMemoryCPU=11G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
...
...
TresPerNode=gpu:1
NtasksPerTRES:0
when I SSH into GPU125 and check “nvidia-smi” there are “no running processes found”.