Feature request - preserve MicrographName column from star file

Hi,

When importing a star file referencing per micrograph particle stacks, with an additional column for the micrograph, cryoSPARC removes the column referring to the micrograph.

This does not cause any issues in cryoSPARC, but it is problematic when a user wants to export that data for use in another program - For example, if I want to polish a particle set obtained by classification in cryosparc, I need a star file with a column for the micrograph name.

I can generate that column from the name of the particle stack with some scripting, but it would be much more convenient if cryoSPARC just preserved the data in the input star file (in general the STAR format seems much more human readable and easier to deal with than CSV, but that is another topic).

Cheers
Oli

Same goes for _rlnCoordinateX and _rlnCoordinateY - without these one can’t run alignparts or re-extract recentered particles.

Cheers
Oli

A temporary work around is to use the _rlnImageName column to grep all the particles in a cryosparc-generated subset from the original input particle star file.

e.g. awk '{print $9}' csparc.star > particles.list to make a list of particle names;

touch csparc_relion.star to make a fresh star file;

Then while read line; do fgrep "$line" all_particles.star; done < particles.list >> csparc_relion.star to grep the particles from the input star and output to another file. (And then add starfile header back).

Obviously this loses orientation information, so will require starting refinement from scratch.

Cheers
Oli

or grep -F -f particles.list all_particles.star > csparc_relion.star

But these solutions are very slow… especially when working with large lists of particles.

can we keep _rlnCoordinateX and _rlnCoordinateY columns in the cvs file please?

Peter

1 Like

do not use mac OSX grep, which is extremely slow with -f switch.
Use gnu grep on a linux system instead, which is 1000x faster.

(apparently it is a known issue:
https://discussions.apple.com/thread/4412947?tstart=0)

You can also use csparc2star.py and star.py with --copy-micrograph-coordinates from my github.

BTW Gnu grep is easily available for OS X if you use Homebrew.

awesome, thank you very much!

@DanielAsarnow hmm - I updated from your github but I can’t see it in the help text for csparc2star.py - am I doing something wrong?

How does it work, do you need to provide a reference star? I didn’t think that info was recorded in the csv at all?

Cheers
Oli

You would need to convert the cryoSPARC CSV to a .star file first, and then use star.py --copy-micrograph-coordinates original.star fromcsparc.star withcoords.star (with correct file names of course).

If you had a single particle stack, so that the particle stack files are also missing from the CSV, then you would also need to run csparc2star.py with --data-path /path/to/stack.mrcs in order to get the rlnImageName parameter in your .star files.

Ah got it, thanks @DanielAsarnow!!