Csparc2star woes

Hi there,

We have a workflow that involves extraction in relion, import to cryosparc, classification and then re-import to relion. This has been complicated in one project by 3 separate extract jobs being imported to cryosparc, joined together and classified. How can these resultant classes of particles be re-exported from cryosparc to relion and retaining the co-ordinates?

Even in single Extract job cases I am having difficulty with the copy-micrograph-coordinates aspect.

A couple of months ago, csparc2star.py --copy-micrograph-coordinates imported.star imported.cs classified_passthroughparticles.cs classified_particles.cs output.star

would work, but now it doesn’t seem to. I’m not familiar with with sed command, and I’m not having any joy following the instruction on Github. When I use:

sed ‘s/J25/imported/Extract/job065/g’ output.star (there is a backslash after J25 and after Extract)

in the terminal window it has replaced the rlnImageName strings correctly, but the following command:

star.py --copy-micrograph-coordinates P2/J25/particles.star output.star output2.star

the output2.star and output.star files are identical - I must be missing something, so any help with these two issues would be very much appreciated!

Cheers

Hannah

Edit: I can use a text editor to replace the paths instead if sed, and if the final command begins with csparc2star.py rather than star.py it gives a correct-looking output for the simple (single Extract job) case.

Hi Hannah. You can definitely get this to work, it sounds like you are very close!

The merging code is almost the same in the two programs. The coordinates are merged testing rlnImageName, the basename of rlnImageName, and the combination of rlnMicrographName and the coordinates. So one of these potential keys must match exactly between the files. csparc2star.py differs in that it will also try removing the cryoSPARC UID prefix automatically. With star.py, you have to give --strip-uid yourself if that’s what you want. Note also that the rlnImageNames are not themselves updated. Only rlnMicrographName, rlnCoordinateX, and rlnCoordinateY are merged, these are all you need to extract. If you need a micrographs .star file, it can be prepared with star.py --to-micrographs.

Since you have multiple sets of particles, it might help that both programs can actually accept a quoted file glob for --copy-micrograph-coordinates such as "Extract/job00[034]/particles.star" or "*.star". It has to be quoted so it doesn’t get expanded by the shell. Alternatively, you can use star.py to join any number of star files first, if this is easier for you.

When you run sed are you using -i (in-place) or piping the output to the new file with >? Otherwise it’s just dumped to the console. If the particles indexes and rlnImageName (or just the basename) match, it will work, as long as most of the particles can be matched. Don’t forget to check the file extensions as well, you may have changed them between .mrc and .mrcs at some point.

Hi Daniel,

Thanks so much for the quick and detailed reply! I now have the sed command working to exchange the rlnImageName (quite right, I forgot to pipe to a new file!), but this won’t allow the path to be changed to Extract/jobxxx/Micrographs/ which is where the .mrcs files are, it will only allow to Extract/jobxxx/ I guess it doesn’t allow adding another directory layer. It’s not a deal-breaker because as I mentioned in my edit I can exchange using bbedit or gedit anyway.

As this works by comparing rlnImageName, I think it is simpler to make a joined star file for the three extracted jobs and use this for the final star.py command.

Out of curiosity, what could have changed between a couple of months ago and now that means the original command I wrote above doesn’t work any more? We did update csparc to v 3.3.2 and pyem was installed 21-04-2022.

For years it has seemed like if you take a nap, you have to learn a new way to export from csparc to relion!

Thanks again!

1 Like

I’m not sure what you mean by not being able to add more to the replace string. You can add whatever you like, but since / is the delimiter, it must be escaped with \. (You also need two \'s to show a \ here).

The csparc2star.py interface has actually been stable for a few years now. Most likely different processing patterns are changing what extra .cs files you may need (you can add more than just the one passthrough, e.g. one from the extract job is guaranteed to have coordinates), or recentering coordinates on extraction is preventing the fall back micrograph/coordinates method from working, or you have mismatched UIDs.