Export data to Relion in v2

I am also encountering this problem. I am happy to provide errors if it is needed.

Cheers

Hello everybody,
Do local motion jobs have a passthrough file? Select jobs work if the passthrough file is provided, but my code notifying the user wasn’t checking for the specific missing fields.

I’ve now fixed this, so that whenever any key is missing the program will recommend trying with --passthrough. Select 2D is tested now, I suspect local motion is OK as the error was the same. Please let me know if it’s working!

Best,
-da

FYI:

csparc2star.py will now ignore most missing fields. It will try to warn you if a passthrough file is required, but it can’t know for sure. You may get star files which are valid, and no errors from conversion, but which don’t have all the fields you expected. If you add --loglevel info you will be notified about what fields were copied and which fields were not found. Please continue to report any errors with the program, and also please let me know what you all feel the default behavior should be as far as these missing fields are concerned.

Hi Daniel,

I found that csparc2star.py is quite sensitive to numpy versions. 1.9.3 from EMAN2 and 1.5 both do not work. Which version of numpy is required?

Zhijie

Hi Daniel,
I can successfully generate a star file from a Select 2D cs file, but Local motion is not working for me (it has a passthrough file and I specified it):

/usr/local/software/pyem/pyem/util/util.py:74: FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array.

This code may break in numpy 1.16 because this will return a view instead of a copy – see release notes for details.
joint[:, offset:offset + size] = a.view(np.uint8).reshape(n, size)
Traceback (most recent call last):
File “/usr/local/software/pyem/csparc2star.py”, line 129, in
sys.exit(main(parser.parse_args()))
File “/usr/local/software/pyem/csparc2star.py”, line 75, in main
df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic)
File “/usr/local/software/pyem/pyem/metadata.py”, line 216, in parse_cryosparc_2_cs
cs = util.join_struct_arrays([cs, pt[[n for n in pt.dtype.names if n != ‘uid’]]])
File “/usr/local/software/pyem/pyem/util/util.py”, line 76, in join_struct_arrays
return joint.ravel().view(dtype)
ValueError: field ‘location/micrograph_uid’ occurs more than once

Cheers,
Simon

@ZhijieLi A current version (1.14+) is required. Both 1.5 and 1.9 are very old. You can upgrade numpy in EMAN2 without errors, however installing into the EMAN2 python is no longer recommended. It used to be necessary because I had some EMAN2 library dependencies, which I eliminated, and because EMAN2 had to be at the beginning of the PATH.

Nowadays, you can just put EMAN2’s bin directory at the end of the PATH and all is well. In that case, another python (like system python or miniconda) can be earlier in the PATH and be used for normal python programs and libraries that don’t have many specialized native components like EMAN2.

@simonfromm Thanks for your error message, it’s helpful actually. We’re discussing this issue in this other thread: Exporting coordinates of picked particles

Hi Daniel,

Thanks for the information! I meant to say 1.15 when I said “1.5”. Sorry for the confusion.

I am now using a fresh installation of anaconda python as the first one in PATH and the numpy version is 1.14.3. Yes under the new numpy csparc2star.py no longer gives the "unexpected keyword “signature” " error. After updating pyem to the current version, the “defocusV not found” error also goes away. Thanks for all the work!

Zhijie

@ZhijieLi OK, glad it works. I will test and make sure 1.15 is compatible, my goal is to track release versions of numpy and pandas which I use heavily.

@Everyone
Particle coordinates will now be converted. This means rlnMicrographName, rlnCoordinateX, and rlnCoordinateY, plus rlnOriginX and rlnOriginY because CryoSPARC coordinates have subpixel localization (decimal part saved in origins, integer part to coordinates). It also means --copy-micrograph-coordinates won’t always work anymore. The micrograph and particle stack paths will point to the CryoSPARC cache location. You can swap the path to the micrographs with --micrograph-path option, which will give you a file that can be directly used with e.g. Relion particle extraction.

I’m willing to accept suggestions with a clear rationale for extending this behavior. As for local motion information, I feel that it is outside of the scope of csparc2star.py to e.g. process trajectories from CryoSPARC and output aligned sums. Therefore I probably won’t do anything more for local motion except to maintain compatibility for converting the other fields, unless someone has a very clear idea for a useful behavior that only involves generating metadata for Relion.

[Edit 1]
Of course, feel free to share your thoughts on this issue or to make pull requests for anything you come up with on your own.

[Edit 2]
Please test the new changes! Thanks!

Hi Daniel,

We also have some issue with running this. We have updated pyem, numpy, and scipy, and the output is below. Any suggestions are appreciated.

python ~/Downloads/pyem/csparc2star.py --loglevel ‘debug’ --passthrough passthrough_particles_class_3.cs cryosparc_P2_J17_class_03_00062_particles.cs j17_to_relion.star
Detected CryoSPARC 2+ .cs file
Reading passthrough file
Concatenating passthrough recarray fields
/home/cate/Downloads/pyem/pyem/util/util.py:74: FutureWarning: Numpy has detected that you may be viewing or writing to an array returned by selecting multiple fields in a structured array.

This code may break in numpy 1.16 because this will return a view instead of a copy – see release notes for details.
joint[:, offset:offset + size] = a.view(np.uint8).reshape(n, size)
Directly copied fields: rlnDefocusAngle, rlnDetectorPixelSize, rlnCtfFigureOfMerit, rlnSphericalAberration, rlnPhaseShift, rlnCtfMaxResolution, rlnVoltage, rlnDefocusU, rlnDefocusV, rlnImageName, rlnMagnification
Converting DEFOCUSANGLE from degrees to radians
Converting PHASESHIFT from degrees to radians
Assigning parameters from 2D classes
Changing RANDOMSUBSET to 1-based index
Changing CLASS to 1-based index
Converting Rodrigues coordinates to Euler angles
Traceback (most recent call last):
File “/home/cate/Downloads/pyem/csparc2star.py”, line 107, in
sys.exit(main(parser.parse_args()))
File “/home/cate/Downloads/pyem/csparc2star.py”, line 47, in main
df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic)
File “/home/cate/Downloads/pyem/pyem/metadata.py”, line 320, in parse_cryosparc_2_cs
axis=1, raw=True, result_type=‘broadcast’))
File “/home/cate/.local/lib/python2.7/site-packages/pandas/core/frame.py”, line 4847, in apply
return self._apply_raw(f, axis)
File “/home/cate/.local/lib/python2.7/site-packages/pandas/core/frame.py”, line 4876, in _apply_raw
result = np.apply_along_axis(func, axis, self.values)
File “/EM/miniconda2/lib/python2.7/site-packages/numpy/lib/shape_base.py”, line 357, in apply_along_axis
res = asanyarray(func1d(inarr_view[ind0], *args, **kwargs))
File “/home/cate/.local/lib/python2.7/site-packages/pandas/core/frame.py”, line 4831, in f
return func(x, *args, **kwds)
TypeError: () got an unexpected keyword argument ‘result_type’
(base)

@zlwatson That one is from old pandas, the requirements.txt specifies >=0.23 for that specific argument. But maybe the setup.py can specify versions too so that pip install works better.

I guess I will try to make a conda package after all since installation is clearly a bottleneck for many people.

FYI to all following this thread: some (but not all) converted .star files may be missing the rlnAmplitudeContrast field, which will break direct reconstructions. The issue has been addressed in the current version on github.

I just found out that the re-ordering of particles is buggy.

I imported an externally created stack (@box 100) to cryosparc for refinement.
After exporting the final cs to star, the numbers are not converted back consecutively.

/opt/pyem/csparc2star.py P11/J127/cryosparc_P11_J127_004_particles.cs test.star --passthrough P11/J127/passthrough_particles.cs

Last entry in the file

tail -1 test.star
0.07 40.358974 -93.5596 166.89922 1 0.0 0.0 70.31 22046.68 22929.99 3.56 367947@J115/imported/20180912_all_DC4.mrcs 10000.0 -6.796875 0.234375 0.0 1 2.7 200.0

After sorting

sort -k12 -n test.star |tail -1
0.07 -101.85376 -81.67742 21.709524 1 0.0 0.0 70.31 22046.68 22929.99 3.56 367980@J115/imported/20180912_all_DC4.mrcs 10000.0 5.890625 -7.515625 0.0 1 2.7 200.0

Re-extraction for the use of unbinned particles, e.g. with other software suites would highly benefit from having the original ordering preserved.
It would be great if you could check this for convenience :blush:

Does this break direct reconstructions @DanielAsarnow? I’ve used relion_reconstruct (from Relion 3) on a star file from cryosparc with no AC field and the output looks fine (and works fine for signal subtraction)

@olibclarke It was the only difference for me the other night, and then when I read through relion_reconstruct.cpp it appeared that the default value is 0 if the field is missing. I’m running several reconstructions of the same particles with different or no AC values now to double check.

@tarek It’s not my code, but cryoSPARC and Relion both that change the order of particles as they see fit. I can add a --sort to star.py to sort by either (stack basename, particle index) or (micrograph name, coordx, coordy) if that would be useful for you.

1 Like

I tested three reconstructions: 1) AC from refinement (0.1), 2) no AC field, and 3) reasonable but incorrect AC value (0.07). I used images of a ribosome with a pixel size of 2.4312 Angstroms, which refines exactly to the Nyquist frequency. (The unbinned resolution is ~3.0 A). Relion3 with relion_reconstruct_mpi --ctf was used for all the reconstructions.

Reconstruction 1 is normal, as expected. Reconstruction 3 is very similar, but has weaker intensities, and is of slightly lower quality.

Reconstruction 2 is extremely degraded, basically uninterpretable. You can see from the histogram, as well, that something isn’t right.

The image below has reconstructions 1 and 3 on the left, and two copies of reconstruction 2 at different isovalues on the right.

1 Like

@tarek It’s not my code, but cryoSPARC and Relion both that change the order of particles as they see fit. I can add a --sort to star.py to sort by either (stack basename, particle index) or (micrograph name, coordx, coordy) if that would be useful for you.

That would be amazing. I’m sure that will be of use for others as well :wink:

Hmm that’s really strange… I have a reconstruction from this star file (header attached) which looks totally normal (high res details, side chains, etc). Granted the initial refinement only went to ~4 Å but it doesn’t seem to be degraded and I was able to use it for signal subtraction and obtain an improved reconstruction. It is a much smaller particle though (~120Å)… I’ll investigate further and compare with/without AC.

Hi @DanielAsarnow,

when trying to use csparc2star I get the following error:

df = metadata.parse_cryosparc_2_cs(cs, passthrough=args.passthrough, minphic=args.minphic, boxsize=args.boxsize, swapxy=args.swapxy)
TypeError: parse_cryosparc_2_cs() got an unexpected keyword argument 'boxsize'

Any ideas how this might be sorted?

Many thanks

Andrija

@asente Please update pyem and use the release branch. I recommend you install with:

git clone -b release https://github.com/asarnow/pyem.git
cd pyem
pip install -e .

The -e flag will install with a symlink such that future git pulls will update the installed pyem libraries as well as the command line tools.

Also note --boxsize is only required if the refinement box is different from the particle box.