Hi, I ran Topaz on cluster (each node contains 2x GPUs), but it seems to only use CPU (see below). It takes a very very long time to finish. I set “Expected number of particles” to 100 and kept other options default. Anyone could help me to figure it out? Thank you!
[CPU: 162.9 MB] THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544176307774/work/torch/csrc/cuda/Module.cpp line=34 error=30 : unknown error
[CPU: 162.9 MB] # Loading model: resnet8
[CPU: 162.9 MB] # Model parameters: units=32, dropout=0.0, bn=on
[CPU: 162.9 MB] # Receptive field: 71
[CPU: 162.9 MB] CudaWarning: cuda runtime error (30) : unknown error at /opt/conda/conda-bld/pytorch_1544176307774/work/torch/csrc/cuda/Module.cpp:34
[CPU: 162.9 MB] Falling back to CPU.
[CPU: 162.9 MB] # Using device=0 with cuda=False
[CPU: 162.9 MB] # Loaded 32 training micrographs with 2660 labeled particles
[CPU: 162.9 MB] # Loaded 7 test micrographs with 584 labeled particles
[CPU: 162.9 MB] # source split p_observed num_positive_regions total_regions
[CPU: 162.9 MB] # 0 train 0.000409 77140 188559360
[CPU: 162.9 MB] # 0 test 0.000411 16936 41247360
[CPU: 162.9 MB] # Specified expected number of particle per micrograph = 100.0
[CPU: 162.9 MB] # With radius = 3
[CPU: 162.9 MB] # Setting pi = 0.0004921527098946454
[CPU: 162.9 MB] # minibatch_size=128, epoch_size=5000, num_epochs=10
[CPU: 162.9 MB] # Done!
[CPU: 162.9 MB] Training command complete.