PyEM-csparc2star.py

LTP · March 31, 2021, 7:55pm

Hello,

I keep getting this error when trying to use the cspar2star.py:

VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray
df[model[k]] = pd.DataFrame(np.array(
Columns must be same length as key
Traceback (most recent call last):
File “/home/xx/data/pyem/csparc2star.py”, line 42, in main
df = metadata.parse_cryosparc_2_cs(cs, passthroughs=args.input[1:], minphic=args.minphic,
File “/data/xx/pyem/pyem/metadata.py”, line 403, in parse_cryosparc_2_cs
df = cryosparc_2_cs_model_parameters(cs, df, minphic=minphic)
File “/data/xx/pyem/pyem/metadata.py”, line 334, in cryosparc_2_cs_model_parameters
df[model[k]] = pd.DataFrame(np.array(
File “/xx/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/core/frame.py”, line 3160, in setitem
self._setitem_array(key, value)
File “/xx/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/core/frame.py”, line 3189, in _setitem_array
raise ValueError(“Columns must be same length as key”)
ValueError: Columns must be same length as key
Required fields could not be mapped. Are you using the right input file(s)?

When I initially tested the script it worked but suddenly with new jobs I stopped working.

I’m doing:

Export job in cryoSPARC
use the full path as input for csparc2star.py

Thanks in advance.

Ablakely · April 1, 2021, 3:48pm

Hi LTP,
You must make sure that there are not any unexpected passthrough items. You can check this in the outputs tab of the CryoSPARC job you are using as input for pyem. I had this problem following multi-class ab-initio due to the alignments for all classed being passed through for individual classes. Running the script without the passthrough information should work, but that may not generate all the information you need in the star file, depending on what your next step is.
I ended up re-running cryoSPARC jobs to get rid of the extra passthrough entries (not ideal), but I suspect if you are good with the python package Numpy you could manually delete the offending columns.

lizellelubbe · May 7, 2021, 5:01pm

Hi @LTP, @Ablakely and @DanielAsarnow

I am suddenly experiencing the same error message with one of my particle stacks and really want to try Relion focused class3D on it. The only difference between this one and the previous, which ran successfully in csparc2star.py, is that I have passthrough alignments2D in the output tab of NU refinement. Is there any way that I can remove it? I thought that using a homo reconstruct with the previous particle stack as input for the right structure may work by overriding the low-level results but could not override the location information since that is passthrough.
I am not familiar with Numpy but can try that if someone can provide guidance.

DanielAsarnow · May 7, 2021, 5:16pm

This is legitimately a bug in pyem, I should be able to fix it today.

DanielAsarnow · May 7, 2021, 5:56pm

Are you giving the passthrough first on the command-line? The order of the input files determines priority for adding or discarding information each file. You should be giving the refinement output first, and the passthrough second. Can you confirm that’s not the source of the problem?

Second question, are there 2D jobs with the reverse problem? Currently my code checks for multiple 3D classes, and otherwise converts 2D classes or a 3D class based on the presence of alignments2D. If I make if do 2D only if alignments3D is not present, then this reverse case will become an issue.

DanielAsarnow · May 7, 2021, 6:06pm

I take it back, I don’t know what the issue is, it isn’t strictly related to the alignments2D thing. If you send me your .cs files I can try to figure it out.

lizellelubbe · May 7, 2021, 6:53pm

Thanks for the replies @DanielAsarnow,
I saw this on the github page

If you have trouble getting this to work, you can first export the job from within cryoSPARC. The .cs file in the exported job directory will already have all the particle parameters merged, and you can then convert that single .cs file.

So I’ve always just exported the particles from the cryosparc output tab of NU refine and ran csparc2star.py on that Px_Jx_particles_exported.cs only specifying --boxsize.

When you say refinement output first, would that be the particles.cs file from the last iteration of refine? Does that contain alignments3D and ctf while passthrough_particles.cs contains alignments2D, blob and particles.location?

How can I send the files to you?

DanielAsarnow · May 7, 2021, 7:28pm

FYI --boxsize is only for when you change the reconstruction volume to have a different size than the particles.

I’ve never actually used the export feature. By the refinement output I do mean the particles.cs of the last iteration. There’s no error caused by inverting them, but typically one would want the refinement output to take priority over the passthrough, if there is overlap (e.g. refined CTF values).

A Dropbox, Box, or Drive link are the easiest way to send the files, I think.

lizellelubbe · May 9, 2021, 10:55am

Hi @DanielAsarnow

I hope you received the shared files. Do you think that it might work to re-extract the particles in CS to get rid of whatever caused issues with csparc2star?

DanielAsarnow · May 11, 2021, 7:52pm

I got your links, I’ll be taking a look this week. Using the non-exported job or doing a new one will probably be the fastest thing since it will likely take me a few more days.

lizellelubbe · May 11, 2021, 7:58pm

Sorry I moved my files today when I tried your script again so they won’t work anymore. I saw that it works with the individual files (with or without adding passthrough to the command) but failed when I used the exported ones. Not sure why exporting particles caused this error. Do you still want to look at the files in case someone else also sees this in future?

abasle · June 7, 2021, 3:59pm

Hello,

We have the same error on one project and we don’t understand what the problem is. Testing on the relion turorial ran on crysoparc works to convert extracted_particles.cs to particles.star.
On our project of interest we get:

…/pyem/pyem/metadata.py:334: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray
df[model[k]] = pd.DataFrame(np.array(
Columns must be same length as key
Traceback (most recent call last):
File “/home/CAMPUS/Software/64bits/pyem/csparc2star.py”, line 42, in main
df = metadata.parse_cryosparc_2_cs(cs, passthroughs=args.input[1:], minphic=args.minphic,
File “/home/CAMPUS/Software/64bits/pyem/pyem/metadata.py”, line 403, in parse_cryosparc_2_cs
df = cryosparc_2_cs_model_parameters(cs, df, minphic=minphic)
File “/home/CAMPUS/Software/64bits/pyem/pyem/metadata.py”, line 334, in cryosparc_2_cs_model_parameters
df[model[k]] = pd.DataFrame(np.array(
File “/home/CAMPUS/Software/64bits/conda/envs/pyem/lib/python3.8/site-packages/pandas/core/frame.py”, line 3160, in setitem
self._setitem_array(key, value)
File “/home/CAMPUS/Software/64bits/conda/envs/pyem/lib/python3.8/site-packages/pandas/core/frame.py”, line 3189, in _setitem_array
raise ValueError(“Columns must be same length as key”)
ValueError: Columns must be same length as key
Required fields could not be mapped. Are you using the right input file(s)?

Has anyone else got this error and solved it?

Cheers,
Arnaud

liz · June 26, 2021, 3:02pm

@abasle
Hi Arnaud,
I got the same error as yours. Have you solved the problem? I also tried to convert .cs from extracted particle files.

Liz

abasle · June 26, 2021, 3:59pm

Hi Liz,

Not quite yet but we managed somehow with trial and errors. My first advice would be to try on a small subset of particles to get things sorted. We found out as well that it is useful to also add the passthrough_particles.cs as it contains some parameters that are missing from the extracted_particles.cs. I am still trying to document the proper way of transferring between cryosparc and relion. Currently I’m looking into transferring the particles co-ordinate and re-extract in Relion.
Cheers,
Arnaud

Cito · June 26, 2021, 4:33pm

Hi guys,
I have an issue about particle importing from cryosparc to relion. The particles were first extracted in relion and imported into cryosparc, after 2D classification and sorting in cryosparc, now I want to import the selected particles back to relion, I performed the command:
csparc2star.py --copy-micrograph-coordinates Extract/job014/particles.star P132/J4/cryosparc_P132_J4_020_particles.cs particles_cryosparc_with_coords.star

And I got this error:
Traceback (most recent call last):
File “/mpcdf/soft/local/pyem_git/pyem/csparc2star.py”, line 110, in
sys.exit(main(parser.parse_args()))
File “/mpcdf/soft/local/pyem_git/pyem/csparc2star.py”, line 60, in main
coord_star = pd.concat(
File “/mpcdf/soft/local/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/core/reshape/concat.py”, line 271, in concat
op = _Concatenator(
File “/mpcdf/soft/local/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/core/reshape/concat.py”, line 326, in init
objs = list(objs)
File “/mpcdf/soft/local/pyem_git/pyem/csparc2star.py”, line 61, in
(star.parse_star(inp, keep_index=False, augment=True) for inp in
File “/home/system/soft.2018-08-22/local/pyem_git/pyem/pyem/star.py”, line 293, in parse_star
df = pd.read_csv(starfile, skiprows=ln, delimiter=’\s+’, header=None, nrows=nrows)
File “/mpcdf/soft/local/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/io/parsers.py”, line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File “/mpcdf/soft/local/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/io/parsers.py”, line 454, in _read
data = parser.read(nrows)
File “/mpcdf/soft/local/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/io/parsers.py”, line 1133, in read
ret = self._engine.read(nrows)
File “/mpcdf/soft/local/anaconda3/envs/pyem/lib/python3.8/site-packages/pandas/io/parsers.py”, line 2037, in read
data = self._reader.read(nrows)
File “pandas/_libs/parsers.pyx”, line 859, in pandas._libs.parsers.TextReader.read
File “pandas/_libs/parsers.pyx”, line 874, in pandas._libs.parsers.TextReader._read_low_memory
File “pandas/_libs/parsers.pyx”, line 928, in pandas._libs.parsers.TextReader._read_rows
File “pandas/_libs/parsers.pyx”, line 915, in pandas._libs.parsers.TextReader._tokenize_rows
File “pandas/_libs/parsers.pyx”, line 2070, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 51, saw 27

Could anyone share how to troubleshoot? Thanks a lot!

Best
Wen

DanielAsarnow · July 2, 2021, 6:52pm

@Cito can you confirm you have the latest version of pyem with all the dependencies installed? If you would like to send me your input files via google drive or box etc. I can take a look.

marino-j · July 6, 2021, 2:58pm

@DanielAsarnow Hi - wanted to ask if you are considering building a web-based application where one would simply upload the cryosparc files, select some options, and get the .star file as output. This way people don’t need to update or install pyem, and the output is always updated with the current relion version. Thanks for your work !