Hi,
I use CryoSPARC v4.0.3 and Topaz version 0.2.5a.
The topaz train job ran ~45 min and failed at the following step:
[CPU: 216.9 MB]# Loading model: resnet8
[CPU: 216.9 MB]# Model parameters: units=32, dropout=0.0, bn=on
[CPU: 216.9 MB]# Receptive field: 77
[CPU: 216.9 MB]# Using device=0 with cuda=True
[CPU: 216.9 MB]WARNING: no coordinates are observed with x_coord > 693 or y_coord > 484. Did you scale the micrographs and particle coordinates correctly?
[CPU: 216.9 MB]# Loaded 40 training micrographs with 5530 labeled particles
[CPU: 216.9 MB]# Loaded 10 test micrographs with 1210 labeled particles
[CPU: 216.9 MB]# source split p_observed num_positive_regions total_regions
[CPU: 216.9 MB]# 0 train 0.00017 160370 942796800
[CPU: 216.9 MB]# 0 test 0.000149 35090 235699200
[CPU: 216.9 MB]# Specified expected number of particle per micrograph = 150.0
[CPU: 216.9 MB]# With radius = 3
[CPU: 216.9 MB]# Setting pi = 0.000184557266210492
[CPU: 216.9 MB]# minibatch_size=128, epoch_size=5000, num_epochs=10
[CPU: 171.5 MB]
Traceback (most recent call last):
[CPU: 171.6 MB]
File “/home/user/ENV/bin/topaz”, line 8, in
[CPU: 171.6 MB]
sys.exit(main())
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/topaz/main.py”, line 148, in main
[CPU: 171.6 MB]
args.func(args)
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/topaz/commands/train.py”, line 694, in main
[CPU: 171.6 MB]
fit_epochs(classifier, criteria, trainer, train_iterator, test_iterator, args.num_epochs
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/topaz/commands/train.py”, line 576, in fit_epochs
[CPU: 171.6 MB]
it = fit_epoch(step_method, train_iterator, epoch=epoch, it=it
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/topaz/commands/train.py”, line 552, in fit_epoch
[CPU: 171.6 MB]
for X,Y in data_iterator:
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 435, in iter
[CPU: 171.6 MB]
return self._get_iterator()
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 381, in _get_iterator
[CPU: 171.6 MB]
return _MultiProcessingDataLoaderIter(self)
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1072, in init
[CPU: 171.6 MB]
self._reset(loader, first_iter=True)
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1105, in _reset
[CPU: 171.6 MB]
self._try_put_index()
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 1339, in _try_put_index
[CPU: 171.6 MB]
index = self._next_index()
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/dataloader.py”, line 618, in _next_index
[CPU: 171.6 MB]
return next(self._sampler_iter) # may raise StopIteration
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/sampler.py”, line 254, in iter
[CPU: 171.6 MB]
for idx in self.sampler:
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/topaz/utils/data/sampler.py”, line 166, in iter
[CPU: 171.6 MB]
yield next(self)
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/topaz/utils/data/sampler.py”, line 150, in next
[CPU: 171.6 MB]
sample = next(g)
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/topaz/utils/data/sampler.py”, line 71, in next
[CPU: 171.6 MB]
self.random.shuffle(self.x)
[CPU: 171.6 MB]
File “/home/user/ENV/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py”, line 66, in handler
[CPU: 171.6 MB]
_error_if_any_worker_fails()
[CPU: 171.6 MB]
RuntimeError: DataLoader worker (pid 37976) is killed by signal: Killed.
Here is the error message:
[CPU: 171.8 MB]
Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 93, in cryosparc_compute.run.main
File “/project/6006782/user/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py”, line 360, in run_topaz_wrapper_train
utils.run_process(train_command)
File “/project/6006782/user/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py”, line 98, in run_process
assert process.returncode == 0, f"Subprocess exited with status {process.returncode} ({str_command})"
AssertionError: Subprocess exited with status 1 (/home/user/topaz.sh train --train-images /home/user/…/J51/image_list_train.txt --train-targets /home/user/…)
I don’t know if the error has anything to do with the warning “no coordinates are observed with x_coord > 693 or y_coord > 484. Did you scale the micrographs and particle coordinates correctly?”. There is Downsampling factor parameter that is used to scale only images. Is there another parameter that can scale particle coordinates?
Any help is very much appreciated!
momo