Exporting coordinates of picked particles

@DanielAsarnow Just tried and it seems to work. Great job, thanks.
completed without error, the star file looks correct.

/opt/pyem/csparc2star.py P11/J7/picked_particles.cs particles.star

@tarek Great, thanks for testing it. Unfortunately it’s not that easy to silence the FutureWarning about reading a view, but don’t worry about it, using a view is my intention there.

@DanielAsarnow Just found out that the resulting star can not be easily read by relion.
I tried to re-extract using the output from conversion as data_star.
For successfull process the column “rlnDetectorPixelSize” is missing.
Can you access this and the magnification (which is just an arbitrary number at the moment) from the database?
Otherwise, the pixelsize will not be estimated correctly or we have to modify the output manually beforehand.

Best,
tarek

@tarek It’s in the micrographs passthrough file. I just added support for microrgraph passthrough files (it will detect particle vs. micrograph passthrough).

Is it working for you? (Use picked_micrographs.cs as the passthrough).

just pulled the latest version from git:

$ /opt/pyem/csparc2star.py P11/J98/extracted_particles.cs J98.star --passthrough P11/J98/picked_micrographs.cs
rlnMicrographName
A passthrough file may be required (check inside the cryoSPARC 2+ job directory)

Please run with --loglevel debug. Seems like there can be a fair amount of heterogeneity in the available fields, so my test jobs might not always be diagnostic. Got to find all the edge cases!

Also, rarely an older cached version of a python script can run, you should always try a few times after update in case. (Shouldn’t affect csparc2star.py, really only the programs using Numba have this issue, but still).

1 Like

$ /opt/pyem/csparc2star.py J98/extracted_particles.cs J98.star --passthrough J98/picked_micrographs.cs --loglevel debug

Detected CryoSPARC 2+ .cs file
Reading passthrough file
Micrograph passthrough detected
Creating micrograph DataFrame from recarray
Creating particle DataFrame from recarray
Merging micrograph fields: rlnPhaseShift, rlnCtfMaxResolution, rlnVoltage, rlnDefocusU, rlnDefocusAngle, rlnMicrographName, rlnDefocusV, rlnCtfFigureOfMerit, rlnSphericalAberration, rlnAmplitudeContrast
rlnMicrographName
A passthrough file may be required (check inside the cryoSPARC 2+ job directory)
‘rlnMicrographName’
Traceback (most recent call last):
File “/opt/pyem/csparc2star.py”, line 42, in main
df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic)
File “/opt/pyem/pyem/metadata.py”, line 281, in parse_cryosparc_2_cs
df = star.smart_merge(df, pt, fields=fields, key=key)
File “/opt/pyem/pyem/star.py”, line 254, in smart_merge
s1 = s1.merge(s2[s2.columns.intersection(fields)], left_on=key, right_index=True, suffixes=["_x", “”])
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/frame.py”, line 6389, in merge
copy=copy, indicator=indicator, validate=validate)
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/reshape/merge.py”, line 61, in merge
validate=validate)
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/reshape/merge.py”, line 551, in init
self.join_names) = self._get_merge_keys()
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/reshape/merge.py”, line 884, in _get_merge_keys
k, stacklevel=stacklevel))
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/generic.py”, line 1382, in _get_label_or_level_values
raise KeyError(key)
KeyError: ‘rlnMicrographName’

For extraction, you must used the particle passthrough file, which contains both the micrograph names and the particle coordinates from the picking step.

Sorry for deleting several replies here. This is the correct answer - use the particle passthrough for particle extraction jobs, and the micrograph passthroughs only for picking jobs where no particle passthrough exists.

Anyone an idea if there were changes in the latest cryosparc update (v2.3.2) regarding the file format?
Reading of the passthrough file after refinement fails.

$ /home/user/pyem/csparc2star.py P6/J62/cryosparc_P6_J62_011_particles.cs --passthrough P6/J62/passthrough_particles.cs csparc_J62_pass.star --loglevel debug
Detected CryoSPARC 2+ .cs file
Reading passthrough file
Particle passthrough detected
Concatenating passthrough fields: alignments2D/split, alignments2D/shift, alignments2D/pose, alignments2D/psize_A, alignments2D/error, alignments2D/error_min, alignments2D/resid_pow, alignments2D/slice_pow, alignments2D/image_pow, alignments2D/cross_cor, alignments2D/alpha, alignments2D/weight, alignments2D/pose_ess, alignments2D/shift_ess, alignments2D/class_posterior, alignments2D/class, alignments2D/class_ess, location/micrograph_uid, location/micrograph_path, location/micrograph_shape, location/center_x_frac, location/center_y_frac, pick_stats/ncc_score, pick_stats/power, pick_stats/template_idx, pick_stats/angle_rad
/home/user/pyem/pyem/util/util.py:74: FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array.
This code may break in numpy 1.16 because this will return a view instead of a copy – see release notes for details.
joint[:, offset:offset + size] = a.view(np.uint8).reshape(n, size)
Creating particle DataFrame from recarray
Directly copied fields: rlnDefocusAngle, rlnDetectorPixelSize, rlnCtfFigureOfMerit, rlnSphericalAberration, rlnAmplitudeContrast, rlnMicrographName, rlnCtfMaxResolution, rlnVoltage, rlnDefocusU, rlnPhaseShift, rlnDefocusV, rlnImageName, rlnMagnification
Converting normalized particle coordinates to absolute
Converted particle coordinates from normalized to absolute with subpixel origin
Converting DEFOCUSANGLE from degrees to radians
Converting PHASESHIFT from degrees to radians
Collecting particle parameters from most likely classes
Columns must be same length as key
A passthrough file may be required (check inside the cryoSPARC 2+ job directory)
Columns must be same length as key
Traceback (most recent call last):
File “/home/user/pyem/csparc2star.py”, line 42, in main
df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic)
File “/home/user/pyem/pyem/metadata.py”, line 326, in parse_cryosparc_2_cs
[cs[names[c]][i] for i, c in enumerate(cls)]))
File “/home/user/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py”, line 3116, in setitem
self._setitem_array(key, value)
File “/home/user/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py”, line 3138, in _setitem_array
raise ValueError(‘Columns must be same length as key’)
ValueError: Columns must be same length as key

Why do you need the passthrough file here? Is it a heterogeneous refinement?

There are different numbers of particles in the particles.cs and passthrough.cs files (due to probability cutoff), which is causing the error. I have a really fast function that sticks structured arrays together using pointers, but converting both to DataFrames first and then merging would be more robust because of this issue.

I’m sort of reluctant to fix it now, though, because they’re going to have a built-in export job in the near future it would seem.

It is after some iterations of heterogeneous and homogeneous refinement, yes. Therefore being able to track back the original coordinates would be helpful.
However, I can fully understand your limited motivation if structura will implement an export job soon.
Thanks for your efforts so far!

This does work fine with the latest version of cryoSPARC for me, I think you are either passing the wrong passthrough file, or are giving the arguments out of order. The option arguments with -- should either be all after, or all before the positional input and output arguments. Also, you should use the passthrough_particles_all_classes.cs file with heterogeneous refinement jobs.

Since it seems to work for you, here is a recent example of mine. with cryosparc 2.4.0, pyem v0.3 (cloned on 2018/10/08).
I have combined in total 4 different “select 2D classes” jobs into one homogeneous refinement job.
Each tree was created by going down the road from importing micrographs, picking, extracting… to 2D classification.

user@machine:/data/scratch/user/2018-10-26$ /home/user/pyem/csparc2star.py --loglevel debug --passthrough P10/J65/passthrough_particles.cs P10/J65/cryosparc_P10_J65_005_particles.cs csparc_P10_J65.star
Detected CryoSPARC 2+ .cs file
Reading passthrough file
Particle passthrough detected
Concatenating passthrough fields: alignments2D/split, alignments2D/shift, alignments2D/pose, alignments2D/psize_A, alignments2D/error, alignments2D/error_min, alignments2D/resid_pow, alignments2D/slice_pow, alignments2D/image_pow, alignments2D/cross_cor, alignments2D/alpha, alignments2D/weight, alignments2D/pose_ess, alignments2D/shift_ess, alignments2D/class_posterior, alignments2D/class, alignments2D/class_ess, location/micrograph_uid, location/micrograph_path, location/micrograph_shape, location/center_x_frac, location/center_y_frac, pick_stats/ncc_score, pick_stats/power, pick_stats/template_idx, pick_stats/angle_rad
/home/user/pyem/pyem/util/util.py:74: FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array.
This code may break in numpy 1.16 because this will return a view instead of a copy – see release notes for details.
joint[:, offset:offset + size] = a.view(np.uint8).reshape(n, size)
Creating particle DataFrame from recarray
Directly copied fields: rlnDefocusAngle, rlnDetectorPixelSize, rlnCtfFigureOfMerit, rlnSphericalAberration, rlnAmplitudeContrast, rlnMicrographName, rlnCtfMaxResolution, rlnVoltage, rlnDefocusU, rlnPhaseShift, rlnDefocusV, rlnImageName, rlnMagnification
Converting normalized particle coordinates to absolute
Converted particle coordinates from normalized to absolute with subpixel origin
Converting DEFOCUSANGLE from degrees to radians
Converting PHASESHIFT from degrees to radians
Collecting particle parameters from most likely classes
Columns must be same length as key
A passthrough file may be required (check inside the cryoSPARC 2+ job directory)
Columns must be same length as key
Traceback (most recent call last):
File “/home/user/pyem/csparc2star.py”, line 42, in main
df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic)
File “/home/user/pyem/pyem/metadata.py”, line 326, in parse_cryosparc_2_cs
[cs[names[c]][i] for i, c in enumerate(cls)]))
File “/home/user/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py”, line 3116, in setitem
self._setitem_array(key, value)
File “/home/user/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py”, line 3138, in _setitem_array
raise ValueError(‘Columns must be same length as key’)
ValueError: Columns must be same length as key

Hi,

I wonder if you ever solved the problem. I am getting the same problem.
Best,
Chiara

Unfortunately not. My guess is that if the particles originate from multiple individual import jobs the traceback gets lost during refinement.
My workaround is to immediately export particle coordinates after select 2D, re-extract with relion and reimport the combined stack into cryosparc.
This can be used for homogeneous refinement with appropriate export of alignment parameters.
However, in some cases I observe considerable amounts of misaligned particles during further (external) classifications. This seems to be more pronounced in cryosparc than in other suites, therefore double check your results.

I am using a simple import so theoretically it should work. I was wondering if it is just a cryosparc version problem. I am also using 2.4. I have tried both the heterogeneous refinement and the particle selection outputs. The particles extract well with relion but when I try to do a refinement it tells me that I am using helical particles.
I find it very strange.

Hi,
Anybody find the particles coordinates in star file converted from passthrough.cs (in the refine jobs) got wrong? Once using relion to re-extract particles, it complained extractParticlesFromOneFrame ERROR: particles lies completely outside micrograph. And it’s true, the coordinates are larger than the micrographs dimension value.
Though I can try to re-extract particles with unbinned pixels in cryoSPARC then I can using extracted particles to refine or class3D in relion. Just in my occasion, relion re-extract particles working more faster and maybe the memory is not enough, cryoSPARC extract bigger box (>500) usually stops with no error sign, I have to change to a smaller box size to run to the end.

Use --swapxy. Also use --boxsize if your refinement volume is a different size from the particles.

Cheers.

Sounds good, I would try. Thanks!

If this is still a lingering issue, here is a script to export only the particle coordinates from cryosparc to star files that can be imported in relion. https://github.com/tribell4310/reliosparc