CryoSPARC v2.16 beta - Topaz Train error

LCarrique · July 14, 2020, 8:53am

Hi all,

I’m having a trouble with Topaz training since we have updated to the v2.16 beta. The log error is the following:

[CPU: 228.9 MB]  Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
  File "cryosparc2_compute/jobs/topaz/run_topaz.py", line 442, in run_topaz_wrapper_train
    model_out.data['preprocess/processed_mics'] = preprocessed_dir
  File "cryosparc2_compute/dataset.py", line 60, in __setitem__
    assert name in self, "Cannot add columns to internal Dataset data with fields {}. Use Dataset.add_fields instead.".format(self.dtype_descr())
AssertionError: Cannot add columns to internal Dataset data with fields [('uid', '<u8'), ('blob/path', '|O'), ('blob/type', '|O'), ('blob/version', '|O'), ('blob/curr_epoch', '<u4'), ('preprocess/psize_A', '<u4'), ('preprocess/num_iters', '<u4'), ('preprocess/threshold', '<f4'), ('preprocess/input_shape', '<u4'), ('preprocess/denoise', '|O'), ('preprocess/lowpass', '<u4'), ('preprocess/normalize', '<u4')]. Use Dataset.add_fields instead.

Does anyone faces similar issue and have a fix?
Best regards,

Loic

jyoo · July 16, 2020, 3:22pm

Hi @LCarrique,

This is an issue that should be fixed in an upcoming patch or release.

Regards,
Jay Yoo

LCarrique · July 17, 2020, 7:32am

Hi Jay,

Great thanks for your reply.
Kind regards,

Loic

olibclarke · July 17, 2020, 11:54am

I have this issue as well - it seems to happen right at the end of training, but before outputs are generated, so Topaz is non-functional for now - is there currently any workaround?

Oli

olibclarke · July 20, 2020, 2:21pm

@jyoo is there a patch for this? Or a way to downgrade to the 2.15 live beta? I need Topaz to work for an upcoming tutorial.

Cheers
Oli

jyoo · July 22, 2020, 6:10pm

Hi @LCarrique and @olibclarke,

A patch is now available addressing this issue.
The patch can be installed with these instructions: https://guide.cryosparc.com/setup-configuration-and-management/software-updates#apply-patches

Regards,
Jay Yoo

olibclarke · July 22, 2020, 6:47pm

Thank you Jay! Installing the patch now, will confirm whether it fixes the issue for me

olibclarke · July 22, 2020, 9:09pm

Hi @jyoo, when I run Topaz train I get a different error now:

jyoo · July 22, 2020, 9:31pm

Hi @olibclarke,

Can you send the full streamlog?

Regards,
Jay Yoo

olibclarke · July 22, 2020, 9:36pm

Here you go:

Launching job on lane default target ubuntu ...
Running job on master node hostname ubuntu
[CPU: 105.4 MB]  Project P22 Job J14 Started
[CPU: 105.5 MB]  Master running v2.16.1-live_deeppick_privatebeta+200722, worker running v2.16.1-live_deeppick_privatebeta+200722
[CPU: 105.6 MB]  Running on lane default
[CPU: 105.6 MB]  Resources allocated: 
[CPU: 105.9 MB]    Worker:  ubuntu
[CPU: 105.9 MB]    CPU   :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]
[CPU: 105.9 MB]    GPU   :  [0]
[CPU: 105.9 MB]    RAM   :  [0]
[CPU: 105.9 MB]    SSD   :  False
[CPU: 105.9 MB]  --------------------------------------------------------------
[CPU: 105.9 MB]  Importing job module for job type topaz_train...
[CPU: 215.3 MB]  Job ready to run
[CPU: 215.3 MB]  ***************************************************************
[CPU: 215.3 MB]  Topaz is a particle detection tool created by Tristan Bepler and Alex J. Noble.
Citations:
- Bepler, T., Morin, A., Rapp, M. et al. Positive-unlabeled convolutional neural networks for particle picking in cryo-electron micrographs. Nat Methods 16, 1153-1160 (2019) doi:10.1038/s41592-019-0575-8
- Bepler, T., Noble, A.J., Berger, B. Topaz-Denoise: general deep denoising models for cryoEM. bioRxiv 838920 (2019) doi: https://doi.org/10.1101/838920

Structura Biotechnology Inc. and cryoSPARC do not license Topaz nor distribute Topaz binaries. Please ensure you have your own copy of Topaz licensed and installed under the terms of its GNU General Public License v3.0, available for review at: https://github.com/tbepler/topaz/blob/master/LICENSE.
***************************************************************

[CPU: 216.0 MB]  Starting Topaz process using version 0.2.4...
[CPU: 216.0 MB]  Random seed used is 1979721300
[CPU: 216.0 MB]  --------------------------------------------------------------
[CPU: 216.0 MB]  Starting preprocessing...

[CPU: 216.0 MB]  Starting micrograph preprocessing by running command /home/user/software/anaconda3/envs/topaz/bin/topaz preprocess --scale 4 --niters 200 --num-workers 24 -o /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/preprocessed [MICROGRAPH PATHS EXCLUDED FOR LEGIBILITY]

[CPU: 216.0 MB]  Preprocessing over 8 processes...
[CPU: 216.1 MB]  Inverting negative staining...
[CPU: 216.1 MB]  Inverting negative staining complete.

[CPU: 216.1 MB]  Micrograph preprocessing command complete.

[CPU: 216.1 MB]  Starting particle pick preprocessing by running command /home/user/software/anaconda3/envs/topaz/bin/topaz convert --down-scale 4 --threshold 0 -o /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/topaz_particles_processed.txt /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/topaz_particles_raw.txt

[CPU: 216.1 MB]  Particle pick preprocessing command complete.

[CPU: 216.1 MB]  Preprocessing done in 360.565s.
[CPU: 216.1 MB]  --------------------------------------------------------------
[CPU: 216.1 MB]  Starting train-test splitting...

[CPU: 216.1 MB]  Starting dataset splitting by running command /home/user/software/anaconda3/envs/topaz/bin/topaz train_test_split --number 20 --seed 1979721300 --image-dir /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/preprocessed /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/topaz_particles_processed.txt

[CPU: 216.1 MB]  # splitting 100 micrographs with 2242 labeled particles into 80 train and 20 test micrographs
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00010gr_00017sq_v02_00002hln_00007enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00016gr_00053sq_v02_00002hln_00011enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b3g1_00022gr_00052sq_v02_00002hln_v01_00005enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00015gr_00012sq_v02_00004hln_00002enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00015gr_00007sq_v02_00002hln_00002enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00016gr_00082sq_v02_00002hln_v01_00011enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00015gr_00012sq_v02_00004hln_00003enn-a-DW". Skipping it.
[CPU: 216.1 MB]  Traceback (most recent call last):
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/bin/topaz", line 11, in <module>
[CPU: 216.1 MB]      load_entry_point('topaz-em==0.2.4', 'console_scripts', 'topaz')()
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 148, in main
[CPU: 216.1 MB]      args.func(args)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train_test_split.py", line 128, in main
[CPU: 216.1 MB]      image_list_train = pd.DataFrame({'image_name': image_names_train, 'path': paths_train})
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/frame.py", line 435, in __init__
[CPU: 216.1 MB]      mgr = init_dict(data, index, columns, dtype=dtype)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
[CPU: 216.1 MB]      return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 64, in arrays_to_mgr
[CPU: 216.1 MB]      index = extract_index(arrays)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 365, in extract_index
[CPU: 216.1 MB]      raise ValueError("arrays must all be same length")
[CPU: 216.1 MB]  ValueError: arrays must all be same length
[CPU: 216.1 MB]  
Dataset splitting command complete.

[CPU: 216.1 MB]  Train-test splitting done in 1.658s.
[CPU: 216.1 MB]  --------------------------------------------------------------
[CPU: 216.1 MB]  Starting training...

[CPU: 216.1 MB]  Starting training by running command /home/user/software/anaconda3/envs/topaz/bin/topaz train --train-images /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/image_list_train.txt --train-targets /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/topaz_particles_processed_train.txt --test-images /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/image_list_test.txt --test-targets /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/topaz_particles_processed_test.txt --num-particles 100 --learning-rate 0.0002 --minibatch-size 128 --num-epochs 10 --method GE-binomial --slack -1 --autoencoder 0 --l2 0.0 --minibatch-balance 0.0625 --epoch-size 5000 --model resnet8 --units 32 --dropout 0.0 --bn on --unit-scaling 2 --ngf 32 --num-workers 24 --cross-validation-seed 1979721300 --radius 2 --num-particles 100 --device 0 --save-prefix=/home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/models/model -o /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/train_test_curve.txt

[CPU: 216.1 MB]  # Loading model: resnet8
[CPU: 216.1 MB]  # Model parameters: units=32, dropout=0.0, bn=on
[CPU: 216.1 MB]  # Loading pretrained model: resnet8_u32
[CPU: 216.1 MB]  # Receptive field: 71
[CPU: 216.1 MB]  # Using device=0 with cuda=True
[CPU: 216.1 MB]  Traceback (most recent call last):
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/bin/topaz", line 11, in <module>
[CPU: 216.1 MB]      load_entry_point('topaz-em==0.2.4', 'console_scripts', 'topaz')()
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 148, in main
[CPU: 216.1 MB]      args.func(args)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 641, in main
[CPU: 216.1 MB]      image_ext=args.image_ext
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train.py", line 221, in load_data
[CPU: 216.1 MB]      train_images = pd.read_csv(train_images, sep='\t') # training image file list
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/io/parsers.py", line 676, in parser_f
[CPU: 216.1 MB]      return _read(filepath_or_buffer, kwds)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/io/parsers.py", line 448, in _read
[CPU: 216.1 MB]      parser = TextFileReader(fp_or_buf, **kwds)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/io/parsers.py", line 880, in __init__
[CPU: 216.1 MB]      self._make_engine(self.engine)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
[CPU: 216.1 MB]      self._engine = CParserWrapper(self.f, **self.options)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/io/parsers.py", line 1891, in __init__
[CPU: 216.1 MB]      self._reader = parsers.TextReader(src, **kwds)
[CPU: 216.1 MB]    File "pandas/_libs/parsers.pyx", line 374, in pandas._libs.parsers.TextReader.__cinit__
[CPU: 216.1 MB]    File "pandas/_libs/parsers.pyx", line 674, in pandas._libs.parsers.TextReader._setup_parser_source
[CPU: 216.1 MB]  FileNotFoundError: [Errno 2] File /home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/image_list_train.txt does not exist: '/home/user/processing/cryosparc_projects/empiar/S_protein/P22/J14/image_list_train.txt'
[CPU: 216.1 MB]  
Training command complete.

[CPU: 216.1 MB]  Training done in 6.045s.
[CPU: 216.1 MB]  --------------------------------------------------------------
[CPU: 216.1 MB]  Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
  File "cryosparc2_compute/jobs/topaz/run_topaz.py", line 359, in run_topaz_wrapper_train
    assert len(glob.glob(os.path.join(model_dir, '*'))) > 0, "Training failed, no models were created."
AssertionError: Training failed, no models were created.```

olibclarke · July 22, 2020, 9:38pm

Both the input micrographs and particles were from an Inspect Picks job, if that’s of any help

jyoo · July 22, 2020, 9:54pm

Hi @olibclarke,

There seems to be an issue with the micrographs input into the job as indicated by this part of the streamlog:

[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00016gr_00053sq_v02_00002hln_00011enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b3g1_00022gr_00052sq_v02_00002hln_v01_00005enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00015gr_00012sq_v02_00004hln_00002enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00015gr_00007sq_v02_00002hln_00002enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00016gr_00082sq_v02_00002hln_v01_00011enn-a-DW". Skipping it.
[CPU: 216.1 MB]  WARNING: no micrograph found matching image name "n20apr21a_b2g2_00015gr_00012sq_v02_00004hln_00003enn-a-DW". Skipping it.
[CPU: 216.1 MB]  Traceback (most recent call last):
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/bin/topaz", line 11, in <module>
[CPU: 216.1 MB]      load_entry_point('topaz-em==0.2.4', 'console_scripts', 'topaz')()
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/topaz/main.py", line 148, in main
[CPU: 216.1 MB]      args.func(args)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/topaz/commands/train_test_split.py", line 128, in main
[CPU: 216.1 MB]      image_list_train = pd.DataFrame({'image_name': image_names_train, 'path': paths_train})
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/frame.py", line 435, in __init__
[CPU: 216.1 MB]      mgr = init_dict(data, index, columns, dtype=dtype)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 254, in init_dict
[CPU: 216.1 MB]      return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 64, in arrays_to_mgr
[CPU: 216.1 MB]      index = extract_index(arrays)
[CPU: 216.1 MB]    File "/home/user/software/anaconda3/envs/topaz/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 365, in extract_index
[CPU: 216.1 MB]      raise ValueError("arrays must all be same length")
[CPU: 216.1 MB]  ValueError: arrays must all be same length

I suspect that some of the micrographs not being found is the cause of the error. This is likely due to the micrograph path within the particles field not fully corresponding with the micrograph paths in the micrographs. Were the particles extracted from a different job than the one the micrographs originate from?

Regards,
Jay Yoo

olibclarke · July 23, 2020, 12:33am

No, the particles were extracted from the same Patch CTF job as the micrographs. The only difference is the micrographs went through a Curate Exposures job after that, so there may be some particles corresponding to a missing mic. But I took the outputs from an Inspect picks job with both of those inputs, so they should match…

jyoo · July 23, 2020, 2:35am

Hi @olibclarke,

Could you try running the job with inputs that worked previously? This will at least verify whether the issue is isolated to Topaz or is a compatibility problem between Topaz and another job. The computational side of the Topaz train job should not have changed at all between 2.15 to 2.16.2.

Regards,
Jay Yoo

LCarrique · July 23, 2020, 7:40am

Thanks @jyoo, I installed the patch and it’s working fine so far.
Kind regards,

Loic

olibclarke · July 23, 2020, 12:25pm

Hi @jyoo, it runs with other inputs - e.g. if I use denoised micrographs it works fine, or if I re-extract the particles and use those coordinates and micrographs it works fine - it seems particular to outputs from Inspect Picks

Oli

jyoo · July 23, 2020, 2:55pm

Hi @olibclarke,

Thanks for the update, I’ll look into this.

Regards,
Jay Yoo

olibclarke · July 23, 2020, 9:55pm

Hi @jyoo,

I get the same error if I use the Exposure Sets tool to create a random subset of micrographs and then input this with the full set of particles for training. This is a workflow that I would like to use to reduce computation, to avoid training using the entire dataset which is very slow.

Cheers
Oli

michellereid · September 12, 2020, 10:06pm

How long did launching the topaz training job take… I am getting no error but also no progress when it comes to launching the job…

au583982 · March 11, 2022, 3:05pm

I can see that it’s been some time since anything has been posted on this thread - conclusive or otherwise - so I just wanted to share my recent experience related to the above. Running v3.3.1 it seems that the preprocessing step becomes idle instead of returning an error, if it is not assigned enough RAM. At least I tried a run with 5649 training exposures (which I now know is absurd overkill), which was seemingly running for 55 h until I killed it without it being done preprocessing, whereas another job with 500 micrographs was done preprocessing after 175 s. Every time I run a job, I get the error message above that the micrographs to which the particles are assigned, are not found in the folder of preprocessed micrographs. This was due to the fact that the preprocessed micrographs were patch-aligned and doseweighted, whereas the particle-assigned micrographs it was looking for were rigid-aligned without weighting (suffix of …fractions_rigid_aligned.mrc vs …fractions_patch_aligned_doseweighted.mrc). This error came no matter if the particles and micrographs were taken directly from a re-extraction job, manual picker job or curate exposures job. Since it does, however, preprocess the doseweighted micrographs, which should be the most fitting (and because it usually downsamples the micrographs quite heavily anyway), the fix I used was simply to rename all the .mrc-files in the preprocessed-folder of the failed job and adding the path of that folder to “Absolute path of directory containing preprocessed directory” of a new job, which then ran smoothly. For anyone who isn’t too happy about writing bash, I used the following two loops (run directly in UNIX terminal in the preprocessed folder):
for f in *fractions_>>suffix to remove<<.mrc; do mv “$f” "${f/>>suffix to remove<<}"; done
for f in *.mrc; do mv “$f” “${f/.mrc}”>>suffix to be inserted<<.mrc; done
Hopefully, this will help others experiencing issues.
And if you know that this hack of treating patch-aligned, doseweighted as rigid-aligned, non-doseweighted is inappropriate, please let me know!