We were having this problem: Topaz errors on cryoSPARC v5.0.2
(Summary: Topaz Cross validation failed in 5.0.2 even though the Topaz train sub-jobs completed successfully. This was fixed in 5.0.4)
I updated to 5.0.4 and now we get a different problem. The Topaz train sub jobs fail.
The parameter being optimized is Epoch size. The Initial value to begin with is 50 and the Value to increment parameter by is 50.
In the training jobs launched by the cross-validation job, Topaz dies on launch with the error
topaz train: error: argument --epoch-size: invalid int value: '150.0'
Inside the job Topaz is being launched with
Starting training by running command /common/app/topaz/0.2.5/bin/topaz train
--epoch-size 150.0 --k-fold 2 --fold 1 --learning-rate 0.0002
--minibatch-size 128 --num-epochs 10 --method GE-binomial --slack -1.0
--autoencoder 0.0 --l2 0.0 --minibatch-balance 0.0625 --model resnet8
--units 32 --dropout 0.0 --bn on --unit-scaling 2 --ngf 32 --num-workers 4
--cross-validation-seed 906145623 --radius 3 --num-particles 40
-o /<...>/J84/cv/model_n150.0_fold1_train_test_curve.txt --device 1
--train-images <...>J90/image_list_train.txt
--train-targets <...>J90/topaz_particles_processed_train.txt
--test-images <...>J90/image_list_test.txt
--test-targets <...>J90/topaz_particles_processed_test.txt
--save-prefix=<...>/J90/models/model
It looks to me like the problem is that the epoch size is being presented as a float to Topaz.
Is this something to do with the way we’re setting up to job or with the cross-validation job?