Topaz default setting

Hi all,

I think it is time to comment on the standard Topaz setting in cryoSPARC.
Namely the “Number of parallel processes” and “Number of CPUs” settings.

Both of them are set to 8 by default.
This results in 8x8=64 CPU processes starting on the worker.
I guess some workstations have this many CPU cores, but having this as default seems a little on the high side to me.

Even worse is this when submitting the job through SLURM, as it is only “Number of CPUs” that dictates how many CPU cores are allocated for the job. Resulting in 8 jobs spawning on each CPU core. This works in most cases, but introduces risk of instability due to potential flooding of the CPU.

Maybe a setting of:

Number of parallel processes=1
Number of CPUs=8

would be a better setting as default?

Cheers,
Jesper

1 Like

You are in good company! New default settings are in the works. Topaz Preprocessing very slow

2 Likes

Oh, that sounds great.
Thanks a lot

Cheers,
Jesper

BTW. It would be nice if allocated CPU’s was the product of “Number of parallel processes” and “Number of CPUs”, so SLURM allocation gets it correct.
Thanks!

1 Like

Noted. Thanks for bringing this to our attention.

1 Like

Hello,

The default settings could be made more user-friendly not only for the computing settings, but also for the settings specific to the dataset under study.

The micrograph downscaling factor (--scale option to topaz preprocess) is one of the most important parameters for a successful training, because a particle must have a certain diameter (or longest dimension) in pixels for the training to work optimally. This is not the same depending on which neural net architecture is used, and this is poorly documented… the best place to find out is the Topaz GUI (actually simply a command builder; you can get it locally from your topaz installation with the command topaz gui), then go to the “Preprocess” section and hover the mouse over the “Scale factor” blue box. The help bubble then says:

Rescaling factor for image downsampling (e.g. a 4k x 4k image downsampled by 4 results in a 1k x 1k image) (type: even integer).

Recommended: Downsample such that the resulting pixelsize is about 8 angstroms; usually downsample by 4, 8, or 16 depending on pixelsize and particle size.

𝗡𝗼𝘁𝗲: Your particle 𝘮𝘶𝘴𝘵 have a diameter (longest dimension) after downsampling of maximum:

70 pixels or less for resnet8
30 pixels or less for conv31
62 pixels or less for conv63
126 pixels or less for conv127

Relion-4 chose to not expose this downscaling factor to the user. Instead, it calculates it automatically from the known pixel size of the micrographs and from the estimated particle diameter in Å input by the user (which is relatively easy to measure with a manual picking job, but typically one has a good sense of the expected particle size after working on the same thing for a while).
Relion-4 also chose to not expose the neural net architecture to the user, and always uses resnet8 by default.
But it lets one overwrite these defaults by passing options explicitly.

I think this is a really good default, very user friendly. If cryosparc could do the same, that would make setting up topaz trainings much easier.

5 Likes

Hello all, thanks for your feedback on this! We’ve addressed this in the latest CryoSPARC v4.5, released May 7, 2024. Topaz parameters now default to 2 processes and 4 workers. Topaz jobs also correctly request the correct CPU resources based on these parameters.

2 Likes