Exporting coordinates of picked particles

Is it possible / can it be made possible to export just the particle coordinates after picking? As of now, I feel discouraged about using the picker in the software as I’m unable to use the coordinates for other purposes as I am able to do when I pick using other software which generates .star or .box files. Would it be possible to add an export function to any common format?

I can see the .cs file generated by the job and I can see it in principle contains the information needed to get the coordinates (micrograph name, x y coordinates), but it’d take some effort to figure out exact formats you use. If this is not a feature you could implement soonish, could you perhaps give me some hints about the format in which your coordinates are stored?

Many thanks,

Juraj

1 Like

+1 this would be great - @DanielAsarnow this isn’t possible at present using pyem by any chance, is it?

It does indeed work. My latest updates have made csparc2star.py pretty robust to missing fields. However, because it no longer crashes, you may not notice missing fields or a passthrough requirement (it will try detect when a passthrough is required, but cannot be guaranteed to work). You can run with --loglevel info to get notifications about every field which is not found.

PS --loglevel debug will give you more information than you could ever want (too much).

2 Likes

Actually, it DOESN’T work. The coordinates are not in the extracted particle file, so it converts without errors but you only the image name and pixel size information.

I am going to leave the program as is for now, where basically every file will convert without errors but there may be a lot of missing info. This is OK for every case other than the particle picking (I think). I will have to figure out the picking info later.

Hi Daniel,

In case it helps:

the XY data in particle picking is stored in the extracted_particles.cs in the Job folder of the ManualPick job.

It’s stored as the last 4 values in each record in the .cs file, namely MicrographXDim(UINT32), MicrographYDim(UINT32), fractionX(FLOAT32), fractionY(FLOAT32), and the final coordinates are

rlnCoordinateX = MicrographXDim * fractionX
rlnCoordinateY = MicrographYDim * fractionY

XDim and YDim are technically defined as an array ((‘location/micrograph_shape’, ‘<u4’, (2,)))
fracX and and fracY are defined as (‘location/center_x_frac’, ‘<f4’) / (‘location/center_y_frac’, ‘<f4’)

I’d add it myself and push it to your github rep, but I’m afraid my python is very limited. :slight_smile:

Best,

Juraj

@xeniorn Thanks, I’ve already put this in. Due to the use of normalized coordinates, it is also necessary to add a shift (rlnOriginX/Y) for the decimal part of the coordinate. It would be great if you could test this version and make sure the converted coordinates are right (in case they are actually a box corner and not the center).

1 Like

Hi @DanielAsarnow how do you run this? I tried using csparc2star.py (latest update) on a picked_particles.cs file and it said it need a passthrough file. After providing the passthrough file I got the attached error:

Just jumping in to support this idea. I found the template picking in cryosparc to be very efficient for some projects. It would be really great to export the coordinates for extraction e.g. with relion.

Nevermind, already great work on pyem Daniel!

@olibclarke Just fixed it. I had a check before converting that just looked for blob/path since I originally thought only 3D classification jobs would need the passthrough (“passthrough is required” message). I since added exception handling which will suggest a passthrough if there is a KeyError (missing field, “passthrough may be required” message). The latter actually takes care of the former, and the assumption about blob/path was wrong anyway.

1 Like

@DanielAsarnow Just tried and it seems to work. Great job, thanks.
completed without error, the star file looks correct.

/opt/pyem/csparc2star.py P11/J7/picked_particles.cs particles.star

@tarek Great, thanks for testing it. Unfortunately it’s not that easy to silence the FutureWarning about reading a view, but don’t worry about it, using a view is my intention there.

@DanielAsarnow Just found out that the resulting star can not be easily read by relion.
I tried to re-extract using the output from conversion as data_star.
For successfull process the column “rlnDetectorPixelSize” is missing.
Can you access this and the magnification (which is just an arbitrary number at the moment) from the database?
Otherwise, the pixelsize will not be estimated correctly or we have to modify the output manually beforehand.

Best,
tarek

@tarek It’s in the micrographs passthrough file. I just added support for microrgraph passthrough files (it will detect particle vs. micrograph passthrough).

Is it working for you? (Use picked_micrographs.cs as the passthrough).

just pulled the latest version from git:

$ /opt/pyem/csparc2star.py P11/J98/extracted_particles.cs J98.star --passthrough P11/J98/picked_micrographs.cs
rlnMicrographName
A passthrough file may be required (check inside the cryoSPARC 2+ job directory)

Please run with --loglevel debug. Seems like there can be a fair amount of heterogeneity in the available fields, so my test jobs might not always be diagnostic. Got to find all the edge cases!

Also, rarely an older cached version of a python script can run, you should always try a few times after update in case. (Shouldn’t affect csparc2star.py, really only the programs using Numba have this issue, but still).

1 Like

$ /opt/pyem/csparc2star.py J98/extracted_particles.cs J98.star --passthrough J98/picked_micrographs.cs --loglevel debug

Detected CryoSPARC 2+ .cs file
Reading passthrough file
Micrograph passthrough detected
Creating micrograph DataFrame from recarray
Creating particle DataFrame from recarray
Merging micrograph fields: rlnPhaseShift, rlnCtfMaxResolution, rlnVoltage, rlnDefocusU, rlnDefocusAngle, rlnMicrographName, rlnDefocusV, rlnCtfFigureOfMerit, rlnSphericalAberration, rlnAmplitudeContrast
rlnMicrographName
A passthrough file may be required (check inside the cryoSPARC 2+ job directory)
‘rlnMicrographName’
Traceback (most recent call last):
File “/opt/pyem/csparc2star.py”, line 42, in main
df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic)
File “/opt/pyem/pyem/metadata.py”, line 281, in parse_cryosparc_2_cs
df = star.smart_merge(df, pt, fields=fields, key=key)
File “/opt/pyem/pyem/star.py”, line 254, in smart_merge
s1 = s1.merge(s2[s2.columns.intersection(fields)], left_on=key, right_index=True, suffixes=["_x", “”])
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/frame.py”, line 6389, in merge
copy=copy, indicator=indicator, validate=validate)
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/reshape/merge.py”, line 61, in merge
validate=validate)
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/reshape/merge.py”, line 551, in init
self.join_names) = self._get_merge_keys()
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/reshape/merge.py”, line 884, in _get_merge_keys
k, stacklevel=stacklevel))
File “/home/user/.local/lib/python2.7/site-packages/pandas/core/generic.py”, line 1382, in _get_label_or_level_values
raise KeyError(key)
KeyError: ‘rlnMicrographName’

For extraction, you must used the particle passthrough file, which contains both the micrograph names and the particle coordinates from the picking step.

Sorry for deleting several replies here. This is the correct answer - use the particle passthrough for particle extraction jobs, and the micrograph passthroughs only for picking jobs where no particle passthrough exists.

Anyone an idea if there were changes in the latest cryosparc update (v2.3.2) regarding the file format?
Reading of the passthrough file after refinement fails.

$ /home/user/pyem/csparc2star.py P6/J62/cryosparc_P6_J62_011_particles.cs --passthrough P6/J62/passthrough_particles.cs csparc_J62_pass.star --loglevel debug
Detected CryoSPARC 2+ .cs file
Reading passthrough file
Particle passthrough detected
Concatenating passthrough fields: alignments2D/split, alignments2D/shift, alignments2D/pose, alignments2D/psize_A, alignments2D/error, alignments2D/error_min, alignments2D/resid_pow, alignments2D/slice_pow, alignments2D/image_pow, alignments2D/cross_cor, alignments2D/alpha, alignments2D/weight, alignments2D/pose_ess, alignments2D/shift_ess, alignments2D/class_posterior, alignments2D/class, alignments2D/class_ess, location/micrograph_uid, location/micrograph_path, location/micrograph_shape, location/center_x_frac, location/center_y_frac, pick_stats/ncc_score, pick_stats/power, pick_stats/template_idx, pick_stats/angle_rad
/home/user/pyem/pyem/util/util.py:74: FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array.
This code may break in numpy 1.16 because this will return a view instead of a copy – see release notes for details.
joint[:, offset:offset + size] = a.view(np.uint8).reshape(n, size)
Creating particle DataFrame from recarray
Directly copied fields: rlnDefocusAngle, rlnDetectorPixelSize, rlnCtfFigureOfMerit, rlnSphericalAberration, rlnAmplitudeContrast, rlnMicrographName, rlnCtfMaxResolution, rlnVoltage, rlnDefocusU, rlnPhaseShift, rlnDefocusV, rlnImageName, rlnMagnification
Converting normalized particle coordinates to absolute
Converted particle coordinates from normalized to absolute with subpixel origin
Converting DEFOCUSANGLE from degrees to radians
Converting PHASESHIFT from degrees to radians
Collecting particle parameters from most likely classes
Columns must be same length as key
A passthrough file may be required (check inside the cryoSPARC 2+ job directory)
Columns must be same length as key
Traceback (most recent call last):
File “/home/user/pyem/csparc2star.py”, line 42, in main
df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic)
File “/home/user/pyem/pyem/metadata.py”, line 326, in parse_cryosparc_2_cs
[cs[names[c]][i] for i, c in enumerate(cls)]))
File “/home/user/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py”, line 3116, in setitem
self._setitem_array(key, value)
File “/home/user/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py”, line 3138, in _setitem_array
raise ValueError(‘Columns must be same length as key’)
ValueError: Columns must be same length as key

Why do you need the passthrough file here? Is it a heterogeneous refinement?

There are different numbers of particles in the particles.cs and passthrough.cs files (due to probability cutoff), which is causing the error. I have a really fast function that sticks structured arrays together using pointers, but converting both to DataFrames first and then merging would be more robust because of this issue.

I’m sort of reluctant to fix it now, though, because they’re going to have a built-in export job in the near future it would seem.

It is after some iterations of heterogeneous and homogeneous refinement, yes. Therefore being able to track back the original coordinates would be helpful.
However, I can fully understand your limited motivation if structura will implement an export job soon.
Thanks for your efforts so far!