Error during import particle stack - file name too long

Hello,
I’m using CS v4.2.1 (single workstation).
I started processing in Cryosparc, then moved particles to Relion 3.1.3 and did several rounds of the following:

  1. Non-uniform refinement and local refinement in Cryosparc
  2. Create a .star file using pyem
  3. Run 3D classification in Relion
  4. Subset selection of the best class
  5. Import back the particles.star file into Cryosparc using the Import Particle Stack

After several rounds, I got an error during the Import Particle Stack stage, claiming that the file name is too long: “OSError: [Errno 36] File name too long:”

I used two methods to create a soft link to the particle files (both did not solve the issue):

  1. find /A/ -name “*.mrcs” | xargs -i ln -s {} -t .
  2. cat list.txt | while read line; do ln -s /path/to/original/files/$line .; done
    followed by
    cat list.txt | while read line; do ln -s /path/to/original/files/$line .; done

How can I overcome this issue?

The complete error message:

[CPU: 741.8 MB] Failed to link /users/user/Projects/ProjectA/cryoGrids/ProjectAa/relion/J424/imported/000009874000958427133_008452406961118317780_004663252376432791439_005765350485077077012_006473124840017215590_008606920131242870795_FoilHole_476820_Data_477191_477193_20230601_151118_fractions_patch_aligned_doseweighted_particles.mrcs /users/cryouser3/ProjectAa/J455/imported/013195493172495176510_000009874000958427133_008452406961118317780_004663252376432791439_005765350485077077012_006473124840017215590_008606920131242870795_FoilHole_476820_Data_477191_477193_20230601_151118_fractions_patch_aligned_doseweighted_particles.mrcs

[CPU: 741.8 MB] Traceback (most recent call last): 
    File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main 
    File "/users/cryouser3/cryosparc/cryosparc_master/cryosparc_compute/jobs/imports/run.py", line 355, in run_import_particles 
        level, all_base_paths, abs_to_rel_map, all_rel_paths = symlink_all_abs_files_to_import_dir(proj_dir_abs, job_dir_rel, all_abs_paths, uid_to_path_map)

    File "/users/cryouser3/cryosparc/cryosparc_master/cryosparc_compute/jobs/imports/run.py", line 681, in symlink_all_abs_files_to_import_dir
        abs_to_rel_map, all_rel_paths = symlink_paths(proj_dir_abs, os.path.join(job_dir_rel, import_dir_name), all_abs_paths, all_base_paths, uid_to_path_map)

    File "/users/cryouser3/cryosparc/cryosparc_master/cryosparc_compute/jobs/imports/run.py", line 668, in symlink_paths os.symlink(abs_path, dest)
OSError: [Errno 36] File name too long: '/users/user/Projects/ProjectA/cryoGrids/ProjectAa/relion/J424/imported/000009874000958427133_008452406961118317780_004663252376432791439_005765350485077077012_006473124840017215590_008606920131242870795_FoilHole_476820_Data_477191_477193_20230601_151118_fractions_patch_aligned_doseweighted_particles.mrcs' -> '/users/cryouser3/ProjectAa/J455/imported/013195493172495176510_000009874000958427133_008452406961118317780_004663252376432791439_005765350485077077012_006473124840017215590_008606920131242870795_FoilHole_476820_Data_477191_477193_20230601_151118_fractions_patch_aligned_doseweighted_particles.mrcs'

Thanks,
Eliane

Hi Eliane,

not an cryoSPARC dev here, but symlinking will not solve your issue, as it’s a filename issue.

Your filename:
013195493172495176510_000009874000958427133_008452406961118317780_004663252376432791439_005765350485077077012_006473124840017215590_008606920131242870795_FoilHole_476820_Data_477191_477193_20230601_151118_fractions_patch_aligned_doseweighted_particles.mrcs

is 256 characters long. I guess, your microscope delivered mrc files with a very long numerical identifier. cryoSPARC just adds the patch_aligned_doseweighted_particles suffixes.

The issue here is, the limit for literally all file systems is 255 bytes (=characters).
The fastest solution I see is to shorten your input mrcs (e.g. by symlinking). Best would be to twerk the microscope settings, to get shorter initial filenames, so cryoSPARC does not run into issue.

Best
Christian

1 Like

I’ve never seen a microscope put that much cruft in front of the FoilHole* name, which is where EPU usually starts from.

The UID prefix is usually added by CryoSPARC, and from the sounds of it, every time @Eliane has re-imported into CryoSPARC, it’s added another new UID on the front.

Ahh that makes more sense that EPU making this long prefixes.

Thanks for your answers. I’m trying to figure out how to shorten the file names, so that Cryosparc will continue to recognize them, and I can continue analysis. Any suggestions?
I lost the picking information for each particle in the process, so I cannot re-extract the particles…

If it’s all in a single particle stack from RELION (which I would recommend) then just rename the file, then edit the *.star file to correspond to the new name. If all of the stacks are individual micrographs, then bash is your friend here. Run for f in *.mrc; do mv "$f" "${f:100}"; done in the directory with all of the per-micrograph stacks. That will crop the first 100 characters. Adjust accordingly to what you desire. Then edit the *.star file accordingly.

edit: The above command doesn’t ask, it just does. If uncertain, just search for how to rename files in Linux.

Hi Eliane,

The several segments of numbers look like UIDs added by cryosparc when importing data. Each time the “import movies” job is run, cryosparc adds a newly generated UID to the front of the movie’s original file name, and uses this string as the file name of the symbolic link in the “import_movies” dir. The subsequent particle stack files will inherit their movie/micrograph file name bases as part of their file name.

I wonder if somehow these files (more likely their original movies/micrographs) were repeatedly imported into cryosparc. Maybe it would be a good idea to review the processing history of this dataset.

The purpose of cryosparc to prepend UIDs to imported file names is to avoid file name collisions. Some people like to use short strings as dir/file names when collecting data by hand. When one has files saved in a structure like a/001.mrc… b/001.mrc, and imports them all together, filename collisions arise because now we have multiple 001.mrc files that need to co-exist in the “import_movies” dir. Then cryosparc prepends every imported file with a randomly generated integer to avoid this problem. Hope this information helps you to devise a better way of moving the data around.

Zhijie