Error in importing particle coordinates from relion to cryosparc

I have seen this topic been discussed in this forum and am really appreciate for the solution orangeboomerang posted previously. I tried to follow the solution but could not get it correctly.
Here is my situation: I imported the movies into cryosparc and did patch-motion and patch-ctf and want to try to re-extract the same set of particles in these images I picked in relion a while ago for re-process.

In Import Particles, I connected the exposures from patch-ctf, and specified the modified .star file (contains _rlnMicrographName #1, _rlnCoordinateX #2, _rlnCoordinateY #3), choose ignore raw data, and in the suffix to cut, I think I cutted it correctly from the end of each image file. But I still get the following error.
I am not sure whether it is because the front of the source micrograph filename has this extra ‘6521773161666237731_’. Can anyone give me some idea how to solve the problem?

[CPU: 192.9 MB]    Example source micrograph filename:
6521773161666237731_20191025_Jing_234_00000_Oct25_22.02.00

[CPU: 193.3 MB]    Example query micrograph filename:
20191025_Jing_234_00009_Oct25_22.04.51
[CPU: 193.3 MB]  Traceback (most recent call last):
 File "cryosparc2_master/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
 File "cryosparc2_compute/jobs/imports/run.py", line 209, in run_import_particles
   assert qname in inv_index_source, "Could not find match for %s" % qname
AssertionError: Could not find match for 20191025_Jing_234_00009_Oct25_22.04.51

Hi @jliu,

This is actually a change we introduced in v2.15.0 where we added a unique identifier to the beginning of each micrograph name, which would ensure micrographs with similar names from different folders wouldn’t get overwritten. This is now technically a bug, as you pointed out there is a regression with the Import Particles job- I’ll make sure to add a “prefix cut” parameter which will allow you to remove that UID from the micrograph path, and you should be able to test via a Patch. I’ll keep this thread updated.

Hi @jliu,

I’ve created some patch files for the Import Particles job for you to test- the changes allow you to specify the number of characters to cut from the prefix of the base micrograph path from the input dataset, which will allow you to remove the extra 6521773161666237731_ from the beginning of the micrograph path. (e.g. enter “20” for the Length of input micrograph path prefix to cut parameter)

To implement this patch, download the run.py and build.py files from the following links:
(either download the files directly by clicking on them or use wget)
https://structura-assets.s3.amazonaws.com/import_particles_v2.15_locations_bugfix/build.py
https://structura-assets.s3.amazonaws.com/import_particles_v2.15_locations_bugfix/run.py

Once downloaded, on your master node, replace the following two files with these files.
cryosparc2_master/cryosparc2_compute/jobs/imports/build.py
cryosparc2_master/cryosparc2_compute/jobs/imports/run.py

Once replaced, restart cryoSPARC:
cryosparcm restart

After that, you should be able to create a brand new Import Particles job to see the new parameters, and you’ll be able to test the changes I just made. Please let me know how it goes!

Thanks, Stephan, the patch you wrote worked in removing the extra 6521773161666237731_ from the beginning of the micrograph path. The problem however is it is still could not find and match the correct micrograph name even though I am pretty sure the micrograph is there. Here is the error message I got:

[CPU: 192.7 MB]    Example source micrograph filename:
 20191025_Jing_234_00000_Oct25_22.02.00

[CPU: 193.1 MB]    Example query micrograph filename:
 20191025_Jing_234_00009_Oct25_22.04.51
[CPU: 193.1 MB]  Traceback (most recent call last):
  File "cryosparc2_master/cryosparc2_compute/run.py", line 85, in cryosparc2_compute.run.main
  File "cryosparc2_compute/jobs/imports/run.py", line 204, in run_import_particles
    assert qname in inv_index_source, "Could not find match for %s" % qname
AssertionError: Could not find match for 20191025_Jing_234_00009_Oct25_22.04.51

Thank you and looking forward to a reply.

Hi @jliu,

Is it possible if you can send me your .star file (maybe just the first 10 entries)? I’m going to have to try to reproduce this issue on my system. Please send it to sarulthasan@structura.bio if you’re able to!

Hi, jliu
I have issue as above
[CPU: 671.0 MB] Traceback (most recent call last):
File “cryosparc2_master/cryosparc2_compute/run.py”, line 85, in cryosparc2_compute.run.main
File “cryosparc2_compute/jobs/imports/run.py”, line 204, in run_import_particles
assert qname in inv_index_source, “Could not find match for %s” % qname
AssertionError: Could not find match for XXX_20200723_131503_fractions

two entries were sent to you, thanks

Followed same fixes up to this point and running into the same errors. Any updates and would it help to send more example files?

Hi @drichman,

Yes please do. I will be revisiting this issue for the next release, and having sample data will be very helpful!

1 Like

Hi @drichman,

I’m investigating the files you sent me now, but I tried a few examples myself and didn’t come across any issues:

For example, my micrographs have the following path (this is a relative path, but that doesn’t matter- only the actual filename matters):
J173/imported/10804479902352024517_HYZ_20200724_030153_fractions_patch_aligned_ctf_diag_2D.mrc

Screen Shot 2020-10-28 at 12.24.56 PM

And in my .star file, I have the following as the rlnImageName: 000031@Extract/job015/movies1/HYZ_20200724_073602_fractions.mrcs

and the following as the rlnMicrographName : MotionCorr/job004/movies1/HYZ_20200724_073602_fractions.mrc

Therefore, in my Import Particles job, I connected the micrographs as inputs,
then specified the following for the parameters:

Screen Shot 2020-10-28 at 12.27.29 PM
I cut the first 21 characters from the input micrograph path (which is the long unique identifier we recently added to all micrographs), then cut the last 30 characters, which I had to do to match the rlnMicrographName (notice this includes the extension). I then cut the extension from the rlnMicrographName in order to match the input micrograph name.

And the job worked:
Screen Shot 2020-10-28 at 12.28.48 PM

I hope this helps.

1 Like

Hi stephan,
I followed the solution and encountered a new situation, which the micrographs had different length of pefix. Some are 20, some are 21.
17414511017527774334_May08_07.13.35.bin_patch_aligned_ctf_spline.npy
1750936384069257236_May08_03.05.02.bin_patch_aligned_ctf_diag_2D.mrc
How to deal with this? Thank you and looking forward to a reply.

1 Like

Hi @dzy,

I noticed this issue when I tried importing more than two files- my mistake for not trying it on a bigger dataset! There is a fix for this that is coming up in the next release of cryoSPARC, which will be soon.

1 Like

Hi @dzy, @drichman, @ajian, @jliu

Thank you all for your help in getting to the bottom of this bug. This is now fixed in cryoSPARC v3.0, which you can update to now!
Screen Shot 2020-12-09 at 11.37.49 AM

1 Like

Hi @stephan,

I have a similar (but a bit different also) problem with connecting my particles to micrographs while importing it. I’m using v3.0.1 and it’s making the same errors above.

assert qname in inv_index_source, "Could not find match for %s" % qname

AssertionError: Could not find match for IBS_Movie_01736_

I matched both source name and the query name by cutting some characters through your kind description above but it’s still going into the same error. One thing I have noticed is that the source name and the query name are showing differently.

[CPU: 562.2 MB] Compiling particle location information…

[CPU: 562.2 MB] Attempting to find corresponding filenames in rlnMicrographName and connected input micrographs…

[CPU: 570.2 MB] Example source micrograph filename:

[CPU: 570.2 MB] IBS_Movie_02830_

[CPU: 572.0 MB] Example query micrograph filename:
[CPU: 572.0 MB] IBS_Movie_01736_

I thought this might be causing the problem but when I checked my star file and the micrograph folder, I can see that the micrographs in the star file and the folder have exactly same lists.

Do you have any idea on how to solve these problems?
Best,
JS