Small beta-barrel protein alignment

Hmm…you probably could do this with an intermediate step of aligning your two “half maps” to ensure they’re in register (this alignment is handled in normal refinements by the GSFSC split resolution parameter). I’d split the particles into two half sets using Particle Sets Tool, perform the two Ab initio reconstructions, align one volume and particle stack to the other, then set each particle’s alignments3D/split field to 0 or 1, respectively.

I wonder, though, if you’ve tried changing the Initial lowpass resolution parameter in the Homogeneous Refinement job to a higher resolution (lower numerical value). If you’re using the default lowpass of 20 Å, it’s probably just destroying all of your beta strands so there’s nothing to align to. You could try a couple values (5, 7, 10 Å?) and see if that helps?

2 Likes

Thank you very much @olibclarke @rposert , I will try the procedures you mentioned.

Cheers
Qi

I hope they help! Please let us know how it goes!

Yes agreed, if you haven’t tried altering the initial lowpass in homogeneous or NU-refine, I would definitely try that first! Particularly as you have an ab initio volume with high resolution features to start from.

@rposert , I have not tried with Homogeneous refinement, but tried Non-uniform refinement with 6 Å initial lowpass, though it did not work. Local refinement worked well with 6 Å and the same particle set, input volume and mask.

@rposert Could you please explicit a bit more about this? I thought I understood, but more information would be helpful. Specifically, I can align the volumes/particles using Volume Alignment Tools. Where/how to set the alignments3D/split to 1 or 0?

Yes of course, apologies. You’ll have to use cryosparc-tools to do this. If you haven’t used it before, you can find some guidance on installing and running it here and also here.

Once you have cryosparc-tools working, you can change the value in the fields relatively easily. Here is an example script that should work (using instance-info.json, described in the second link above):

from cryosparc.tools import CryoSPARC
import json
from pathlib import Path

with open(Path('~/instance-info.json').expanduser(), 'r') as f:
    instance_info = json.load(f)

cs = CryoSPARC(**instance_info)
assert cs.test_connection()

project_number = "P337"
workspace_number = "W14"

project = cs.find_project(project_number)

abinit_1_uid = "J119"
abinit_1_job = project.find_job(abinit_1_uid)
abinit_1_particles = abinit_1_job.load_output("particles_class_0")

aligned_abinit_2_uid = "J121"
aligned_abinit_2_job = project.find_job(aligned_abinit_2_uid)
aligned_abinit_2_particles = aligned_abinit_2_job.load_output("particles_aligned_0")

abinit_1_particles["alignments3D/split"] = 0
aligned_abinit_2_particles["alignments3D/split"] = 1

combined_particles = abinit_1_particles.append(aligned_abinit_2_particles)

external_job_uid = project.save_external_result(
    workspace_uid = workspace_number,
    dataset = combined_particles,
    type = "particle",
    name = "particles",
    title = "Combined particles"
)
project.find_external_job(external_job_uid).log(f"Combined particles from {abinit_1_uid} and {aligned_abinit_2_uid} as half-sets.")

In this case,

  • abinit_1_uid should be the UID (like “J119”) of one of your ab initio jobs
  • aligned_abinit_2_uid should be the UID of the Align 3D Maps job used to align the other abinit map to the first.

This will produce a new job which has the two particle sets, split correctly, as a single “particles” output:

image

You should be able to use these particles in a Local Refinement with one of your maps as the input volume. Be sure that Force re-do GS split is turned off!

I’ll be interested to hear how this technique works out for you!


Also, do you think you could share images of the ab initio results and the homogeneous refinement with an initial lowpass filter of 6 Å? I’m curious if I’ll be able to see anything that might indicate to me why the homogeneous refinement is not able to align the particles.

@rposert Sorry for the late response, and thank you so much for the invaluable suggestion with all the details, I will give a try with the cryoSPARC tools and let you know.

1 Like

@rposert Sorry for the long delay. I finally managed to have the cryosparc tools installed and tested running with the script. Though there are some errors reported.

Connection succeeded to CryoSPARC command_core at http://rohpc02:39002
Connection succeeded to CryoSPARC command_vis at http://rohpc02:39003
Connection succeeded to CryoSPARC command_rtp at http://rohpc02:39005
/usr/local/biotools/python/3.11.5/lib/python3.11/contextlib.py:137: UserWarning: *** CommandClient: (http://rohpc02:39003/external/projects/P17/jobs/J210/outputs/particles/dataset) URL Error [Errno 104] Connection reset by peer, attempt 1 of 3. Retrying in 30 seconds
return next(self.gen)
/usr/local/biotools/python/3.11.5/lib/python3.11/contextlib.py:137: UserWarning: *** CommandClient: (http://rohpc02:39003/external/projects/P17/jobs/J210/outputs/particles/dataset) URL Error [Errno 104] Connection reset by peer, attempt 2 of 3. Retrying in 30 seconds
return next(self.gen)
Traceback (most recent call last):
File “/rodata/cryoem/m060907/GM146-1/CS-gm146-1/combineParticles.py”, line 29, in
external_job_uid = project.save_external_result(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/m060907/.local/lib/python3.11/site-packages/cryosparc/project.py”, line 278, in save_external_result
return self.cs.save_external_result(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/m060907/.local/lib/python3.11/site-packages/cryosparc/tools.py”, line 584, in save_external_result
job.save_output(output, dataset)
File “/home/m060907/.local/lib/python3.11/site-packages/cryosparc/job.py”, line 1520, in save_output
with make_request(self.cs.vis, url=url, data=dataset.stream()) as res:
File “/usr/local/biotools/python/3.11.5/lib/python3.11/contextlib.py”, line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File “/home/m060907/.local/lib/python3.11/site-packages/cryosparc/command.py”, line 226, in make_request
raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc.errors.CommandError: *** (http://rohpc02:39003/external/projects/P17/jobs/J210/outputs/particles/dataset, code 500) URL Error [Errno 104] Connection reset by peer

Could you help suggest a solution to the error? Many thanks again!

Hi @huqi!

Is the version of cryosparc-tools the same as the version of CryoSPARC you’re running? E.g., the first two numbers in the top-left of your CryoSPARC home page:

image

should be the same as the result of pip freeze | grep cryosparc

$ pip freeze | grep cryosparc
cryosparc-tools==4.5.0

(note that CryoSPARC is 4.5.3, while tools is 4.5.0. That’s okay, it’s just the first two numbers that need to match).

If they don’t match, try running pip install --upgrade cryosparc-tools~={version}, replacing {version} with your CryoSPARC version number with the last number replaced with a zero. For example, in this case:

image

I’d run pip install --upgrade cryosparc-tools~=4.4.0.

Thank you so much @rposert . Looks the versions match.

[m060907@rohpc02 CS-gm146-1]$ pip freeze | grep cryosparc
cryosparc-tools==4.5.0

image

I realized that it is possibly the ‘save_external_result’ that cannot be executed.

external_job_uid = project.save_external_result(
workspace_uid = workspace_number,
dataset = combined_particles,
type = “particle”,
name = “particles”,
title = “Combined particles”
)

I may be wrong, but is there any possibility that it is related to the internet or firewall settings, or it is more likely a problem with Python settings? Any clue is very much appreciated!

Regards,
Qi

@rposert I was able to install the cryosparc-tools on our another workstation and have it run with the script you previously shared (thank you so much for this!). Though I got a different error than the previous one when running on the HPC. Please see the details below. Could you please suggest how I can fix the error? Thank you for any comments/suggestions.

[mxxxxx@cxxxxx CS-test]$ python reCenter.py
Connection succeeded to CryoSPARC command_core at http://cxxxxx:61002
Connection succeeded to CryoSPARC command_vis at http://cxxxxx:61003
Connection succeeded to CryoSPARC command_rtp at http://cxxxxx:61005
/home/mxxxxx/miniconda3/envs/Geeks/lib/python3.8/contextlib.py:113: UserWarning: *** CommandClient: (http://cxxxxx:61003/external/projects/P13/jobs/J160/outputs/recentered_particles/dataset) HTTP Error 422 UNPROCESSABLE ENTITY; please check cryosparcm log command_vis for additional information.
Response from server: b"Invalid dataset; missing the following required fields: {(‘location/micrograph_psize_A’, ‘<f4’)}"
return next(self.gen)
Traceback (most recent call last):
File “reCenter.py”, line 93, in
project.save_external_result(
File “/home/mxxxxx/miniconda3/envs/Geeks/lib/python3.8/site-packages/cryosparc/project.py”, line 278, in save_external_result
return self.cs.save_external_result(
File “/home/mxxxxx/miniconda3/envs/Geeks/lib/python3.8/site-packages/cryosparc/tools.py”, line 584, in save_external_result
job.save_output(output, dataset)
File “/home/mxxxxx/miniconda3/envs/Geeks/lib/python3.8/site-packages/cryosparc/job.py”, line 1520, in save_output
with make_request(self.cs.vis, url=url, data=dataset.stream()) as res:
File “/home/mxxxxx/miniconda3/envs/Geeks/lib/python3.8/contextlib.py”, line 113, in enter
return next(self.gen)
File “/home/mxxxxx/miniconda3/envs/Geeks/lib/python3.8/site-packages/cryosparc/command.py”, line 226, in make_request
raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc.errors.CommandError: *** (http://cxxxxx:61003/external/projects/P13/jobs/J160/outputs/recentered_particles/dataset, code 422) HTTP Error 422 UNPROCESSABLE ENTITY; please check cryosparcm log command_vis for additional information.
Response from server: b"Invalid dataset; missing the following required fields: {(‘location/micrograph_psize_A’, ‘<f4’)}"

log of command_vis:

2024-08-06 13:57:01,551 upload_external_job_output INFO | Received external job output P13.J160.recentered_particles
2024-08-06 13:57:01,681 upload_external_job_output ERROR | Invalid dataset; missing the following required fields: {(‘location/micrograph_psize_A’, ‘<f4’)}
3 error ERROR | Can’t connect to (‘0.0.0.0’, 61003)
2024-08-06 12:20:34 info INFO | Starting gunicorn 20.1.0
2024-08-06 12:20:34 error ERROR | Connection in use: (‘0.0.0.0’, 61003)
2024-08-06 12:20:34 error ERROR | Retrying in 1 second.
2024-08-06 12:20:35 error ERROR | Connection in use: (‘0.0.0.0’, 61003)
2024-08-06 12:20:35 error ERROR | Retrying in 1 second.
2024-08-06 12:20:36 error ERROR | Connection in use: (‘0.0.0.0’, 61003)
2024-08-06 12:20:36 error ERROR | Retrying in 1 second.
2024-08-06 12:20:37 error ERROR | Connection in use: (‘0.0.0.0’, 61003)
2024-08-06 12:20:37 error ERROR | Retrying in 1 second.
2024-08-06 12:20:38 error ERROR | Connection in use: (‘0.0.0.0’, 61003)
2024-08-06 12:20:38 error ERROR | Retrying in 1 second.
2024-08-06 12:20:39 error ERROR | Can’t connect to (‘0.0.0.0’, 61003)

Some more update. The error persists when running different scripts from the cryosparc-tools examples and the above script. Although the example script #5 (generate High-Res 2D classes) worked on both the workstation and HPC, without any errors.

More updates. I was able to figure out the errors are not related to Python or internet. Instead, by modifying the ‘save_external_result’ portion of the script, it was run to the end without errors. I inserted

slots = [‘blob’]

to the

project.save_external_result( )

to give

external_job_uid = project.save_external_result(
workspace_uid = workspace_number,
dataset = combined_particles,
type = “particle”,
name = “particles”,
slots = [‘blob’],
title = “Combined particles”
)

Though I severely doubt this is the correct parameter to use. @rposert Could you or anyone else please suggest what changes should I make to generate the correct particle stack? Appreciated! :blush:

Hi @huqi, sorry about the delay. Adding slots=['blob'] should be fine. Glad you got it working!

Hi @rposert , thank you for your response. I still got errors with the generated particle stack when running local refinement. Specifically, how can I keep the ‘alignment3D’ from the Ab initio jobs?

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 73, in cryosparc_master.cryosparc_compute.run.main
File “/biotools8/biotools/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1213, in check_default_inputs
assert False, 'Non-optional inputs from the following input groups and their slots are not connected: ’ + missing_inputs + ‘. Please connect all required inputs.’
AssertionError: Non-optional inputs from the following input groups and their slots are not connected: particles.ctf, particles.alignments3D. Please connect all required inputs.

Hi @huqi, apologies for the delay.

This usually happens because Ab Initio Reconstruction does not use all particles when producing a single class (for reasons described here). So you could

  • increase the Num particles to use parameter such that it is larger than your number of particles, or
  • use multiple classes (this might be better if you think there is still junk in your input particle stack)

I hope that helps!

Hi @rposert,

I ran into the same issue when using cryoSPARC tools in a Jupyter notebook. Could you help me diagnose what might be happening?

Initially, I noticed that the version number in the upper-left corner in the cryoSPARC home page showed “null”, and a response by wtempel in this thread suggested that restarting cryoSPARC would help. While this did fix the version number, it didn’t have any effect on the connection reset by peer error.

Based on your comment, I also checked to make sure my versions of cryoSPARC and cryosparc-tools match:

image

This is a little strange, because the notebook appears to be connecting successfully:

cs = CryoSPARC(
    license="redacted",
    host="localhost",
    base_port=39000,
    email=redacted",
    password="redacted"
)

assert cs.test_connection()

Connection succeeded to CryoSPARC command_core at http://localhost:39002
Connection succeeded to CryoSPARC command_vis at http://localhost:39003
Connection succeeded to CryoSPARC command_rtp at http://localhost:39005

Moreover, I can successfully import the results of a job into the notebook. The error only occurs when I try to save a results group.

for cluster_id, inds in clusters.items():
    cluster_cs = J380_particles.take(inds)
    cluster_cs_name = f'cluster_{cluster_id}/{num_clusters}'
    print(f'{cluster_cs_name}: len({cluster_cs.rows()}) particles')
    
    print(cluster_cs_name)
    cs.save_external_result('P10',
                            'W17',
                            cluster_cs,
                            type="particle",
                            name=cluster_cs_name,
                            slots=["blob"],
                            passthrough=('J380', "particles"),
                            title=cluster_cs_name    
    )

cluster_0/10: len(Spool object with 149935 items.) particles
cluster_0/10

/programs/x86_64-linux/anaconda/2022.10/envs/cs-tools/lib/python3.8/contextlib.py:113: UserWarning: *** CommandClient: (http://localhost:39003/external/projects/P10/jobs/J700/outputs/cluster_0/10/dataset) URL Error [Errno 104] Connection reset by peer, attempt 1 of 3. Retrying in 30 seconds
  return next(self.gen)
/programs/x86_64-linux/anaconda/2022.10/envs/cs-tools/lib/python3.8/contextlib.py:113: UserWarning: *** CommandClient: (http://localhost:39003/external/projects/P10/jobs/J700/outputs/cluster_0/10/dataset) URL Error [Errno 104] Connection reset by peer, attempt 2 of 3. Retrying in 30 seconds
  return next(self.gen)

Inside the project, a results group is generated. But although the job appears to have stopped with an error, there isn’t an obvious error message in the log:
image

[CPU:  268.3 MB  Avail: 184.32 GB]
Adding input slot particles (type particle)...

[CPU:  268.3 MB  Avail: 184.31 GB]
Created input group: particles (type particle)

[CPU:  268.3 MB  Avail: 184.31 GB]
Adding output group cluster_0/10 (type particle)...

[CPU:  268.3 MB  Avail: 184.31 GB]
Created output group cluster_0/10 (type particle, passthrough particles)

[CPU:  268.3 MB  Avail: 184.32 GB]
Added output slot for group cluster_0/10: blob (type particle.blob)

[CPU:  268.3 MB  Avail: 184.32 GB]
Passed through from input particles to output cluster_0/10

[CPU:  268.3 MB  Avail: 184.32 GB]
Passed through J380.particles to J700.particles

License is valid.

Any insights would be greatly appreciated!

Best,
cbeck

Actually, it looks like the save_external_result method works if I execute it outside of the for loop:

cluster_cs = J380_particles.take(clusters[0])
cs.save_external_result('P10',
                        'W17',
                        cluster_cs,
                        type="particle",
                        name="test",
                        slots=["blob"],
                        passthrough=('J380', "particles"),
                        title="test" 
                       )

'J701'

Is there something wrong with my syntax? I was following the example on your github page where you used a for loop to split a particle stack by its symmetry group and export each group of particles back into the project using a for loop: split_by_sym_exp.ipynb · GitHub

for symmetry_index in range(num_rotations + 1):
    sym_subset = expanded_particles.query({
        'sym_expand/idx': symmetry_index
    })
    
    cs.save_external_result(
        sym_exp_project_number,
        sym_exp_workspace_number,
        sym_subset,
        type = 'particle',
        slots = ['blob'],
        passthrough = (sym_exp_job_number, 'particles'),
        title = f"Symmetry Index {symmetry_index}"
    )

Do you have an idea of what might be happening? While it isn’t too difficult to just manually save each external result, it would be useful to execute t inside a for loop to help automate my workflow.

Nevermind, it was a syntax error. The name for the result can’t contain slashes because it gets converted to a URL, which makes the URL unreadable.

Cheers,
cbeck

2 Likes