Bug: input files with same name will conflict during CTFFind

Hi,

I have just found out that if some imported files (from different directories) are with a same filename, they will generate conflicts at the CTFFind step.

For example, if we have two micrograhs, PWD/0901/0001.mrc and PWD/0909/0001.mrc.
When imported in job001, they are going to be s linked as:
J001/0901/0001.mrc (symbolic link to PWD/0901/0001.mrc)
J001/0909/0001.mrc (symbolic link to PWD/0909/0001.mrc)
This is completely fine.

However, when CTFFind job J002 starts, probably because CTFFind only outputs the filename part, the output from both input files will be saved in J002 as

J002/ctffind_output/0001_ctffind_plotdata.npy

Clearly the second CTFFind run on file 0909/0001.mrc had simply overwritten the output of 0901/0001.mrc. The system will not report any problem. Autopicking and particle extraction will both work. But when the resulting particles are sent to 2D classification, the 2D class job will die with error message stating that size of array has been changed.

I guess reconstructing in the CTFFind dir a tree structure same as the import job dir would solve this problem.

Zhijie

1 Like

Hi @ZhijieLi,
Thanks for reporting this - we have recorded the issue and will fix.

Hi @ZhijieLi,

In order to combat the issue of overwritten filenames when filename inputs are the same, we’ve prepended a short unique identifier to each file outputted by CTFFIND 4.1.10 as of cryoSPARC v2.4. This ensures we can still properly organize individual files without reaching the filename character limit. We hope this helps!

Hi sarulthasan,
Thanks!