Remove duplicate parallelization?

Hi,

Remove duplicates is very slow on large datasets (millions of symmetry expanded particles), taking multiple hours to complete.

It does not seem to be parallelized across multiple CPUs (correct me if I am mistaken here). Would it be possible to parallelize it to speed up this process for large datasets?

Cheers
Oli

Hi @olibclarke! We’ll look into this. In the meantime, you could gain a bit of parallelization by splitting your micrographs and particles (using the Selected Exposures (index) parameter of Manually Curate Exposures) and running several separate Remove Duplicates jobs in parallel.

1 Like

ah good idea, thanks!