I support a group of cryosparc users who have recently run into an issue using Topaz Train. Here is the output of the failed Topaz Train job.
Micrograph preprocessing command complete.
[CPU: 228.9 MB]
Starting particle pick preprocessing by running command /users/r/a/rcat/miniconda3/envs/topaz/bin/topaz convert --down-scale 4 --threshold 0 -o /netfiles/rcat_lab/cryosparc/cs-data/J185/topaz_particles_processed.txt /netfiles/rcat_lab/cryosparc/cs-data/J185/topaz_particles_raw.txt
[CPU: 228.9 MB]
Particle pick preprocessing command complete.
[CPU: 228.9 MB]
Preprocessing done in 301.530s.
[CPU: 228.9 MB]
--------------------------------------------------------------
[CPU: 228.9 MB]
Starting train-test splitting...
[CPU: 228.9 MB]
tarting dataset splitting by running command /users/r/a/rcat/miniconda3/envs/topaz/bin/topaz train_test_split --number 17 --seed 541036979 --image-dir /netfiles/rcat_lab/cryosparc/cs-data/J185/preprocessed /netfiles/rcat_lab/cryosparc/cs-data/J185/topaz_particles_processed.txt
[CPU: 228.9 MB]
# splitting 85 micrographs with 434 labeled particles into 68 train and 17 test micrographs
[CPU: 228.9 MB]
Traceback (most recent call last):
[CPU: 228.9 MB]
File "/users/r/a/rcat/miniconda3/envs/topaz/bin/topaz", line 8, in <module>
[CPU: 228.9 MB]
sys.exit(main())
[CPU: 228.9 MB]
File "/users/r/a/rcat/miniconda3/envs/topaz/lib/python3.8/site-packages/topaz/main.py", line 148, in main
[CPU: 228.9 MB]
args.func(args)
[CPU: 228.9 MB]
File "/users/r/a/rcat/miniconda3/envs/topaz/lib/python3.8/site-packages/topaz/commands/train_test_split.py", line 108, in main
[CPU: 228.9 MB]
targets_train = pd.concat(groups_train, 0)
[CPU: 228.9 MB]
TypeError: concat() takes 1 positional argument but 2 were given
[CPU: 228.9 MB]
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
File "/gpfs2/scratch/rcat/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/run_topaz.py", line 307, in run_topaz_wrapper_train
utils.run_process(split_command)
File "/gpfs2/scratch/rcat/cryosparc/cryosparc_worker/cryosparc_compute/jobs/topaz/topaz_utils.py", line 98, in run_process
assert process.returncode == 0, f"Subprocess exited with status {process.returncode} ({str_command})"
AssertionError: Subprocess exited with status 1 (/users/r/a/rcat/miniconda3/envs/topaz/bin/topaz train_test_split --number 17 --seed 541036979 --image-dir /netfiles/rcat_lab/cryosparc/cs-data/J185/preprocessed /netfiles/rcat_lab/cryosparc/cs-data/J185/topaz_particles_processed.txt)
I should mention that this user group had previously been running into issue with Topaz denoise jobs - that issue turned out to be related to PyTorch. The Topaz installation instructions on the Cryosparc site suggest using Python3.6 and installing via Conda - when following those steps PyTorch was using a version of Cuda that was too old for our A100 cards (these cards need Cuda 11+).
I created a new conda environment using Python3.8 (hoping to get a newer version of PyTorch). I then had to install everything via pip
rather than conda install
as I was only getting the CPU version of PyTorch when installing via conda, even when specifying cuda toolkit.
Here is the output of pip freeze
pip freeze
certifi==2022.12.7
charset-normalizer==3.1.0
future==0.18.3
idna==3.4
joblib==1.2.0
numpy==1.24.3
pandas==2.0.1
Pillow==9.5.0
python-dateutil==2.8.2
pytz==2023.3
requests==2.28.2
scikit-learn==1.2.2
scipy==1.10.1
six==1.16.0
threadpoolctl==3.1.0
topaz-em==0.2.5
torch==1.13.1+cu116
torchaudio==0.13.1+cu116
torchvision==0.14.1+cu116
typing_extensions==4.5.0
tzdata==2023.3
urllib3==1.26.15
That solved the issues we were seeing with denoise jobs “failing successfully” and resulted with the “denoised_micrographs” directory being empty. My concern is that using newer versions of Pytorch may have introduced an issue with Topaz Train jobs. I don’t actually use the software much myself so I’m unsure where to look next.
Any help or advice on how to get this environment working for Topaz jobs would be helpful. Thank you in advance to the community here.
-Travis