Incorrection particle positions after importing results from a broken cryoSPARC instance

Hi all,

We are looking to reprocess a portion of a dataset that essentially needs us to repeat a few small jobs. Unfortunately, one of our storage servers has failed, and this contained our initial cryoSPARC instance. The database has been recovered and restored to a different computer so we can now see jobs, but the original folder no longer exists so we can’t directly access the data. We have a backup of most of the data, but recovery is very slow so we have requested the motion-corrected micrographs and a few refinement jobs where we can resume processing from on a different network share.

My goal was to import the motion-corrected micrographs, run CTF estimation, link the particles to these micrographs then extract, reconstruct and complete the required workflow. I looked into the “Import Result Group” job for the particles but this failed as we don’t have the extracted micrographs directory.

I’m now using csparc2star.py to export the positions to star file, then importing the particles straight back into cryoSPARC. This looked to be working, but on inspection the particle positions are incorrect. Flipping micrographs in Y, X and XY didn’t correct the positions. I think the issue may be a scaling problem as the original movies are super-res, the motion-corrected are Fourier crop = 0.5 and the extracted particles are binned slightly from 448 to 416. Using the inspect picks job, I can see that some particles lie on the micrograph edge while sometimes there are fewer (or no) visible particles where I’m assuming they are out of bounds.

If anyone has any suggestions on working around this issue, it would be greatly appreciated!

Mat

Because the recovered backup of the database is no longer consistent with the (availability or point-in-time states) of project directories, I would discourage continuation of the projects in the recovered database. The recovered database may still be useful for reference. For actual processing, you may want to

  1. restrict use of the recovered CryoSPARC instance “for reference only”
  2. request a new license ID
  3. use the new license ID to install a “for processing” CryoSPARC instance

More than one CryoSPARC instance can be installed on the same server provided all of

  • no overlaps between the instances’ port ranges and
  • no overlaps between the instances’ respective sets of project directories and
  • a unique database directory for each instance and
  • unique license IDs
  • recommended: separate, dedicated Linux account for each instance

@wtempel I’m not too concerned about the state of cryoSPARC here, when I say the server went down it was just the RAID containing data (for us this is /home/net). Our instance could be safely stopped as the system disk was healthy and then imaged onto another SSD, which was then mounted into another computer. The only things that have changed are the hostnames, which just required the worker nodes to be reconnected manually. The only issue is that we have projects that are on the failed RAID that are inaccessible for processing until the RAID is repaired. We are reprocessing this dataset on a new project with the files saved to a different mountpoint (/home/net3).

We’re happily continuing to run cryoSPARC jobs on new projects, so my only (current) issue is trying to correctly import particle positions.

Hi @mathewmclaren! Is the problem that the file locations have changed from /home/net to /home/net3, so CryoSPARC can’t find the images anymore?

Hi @rwaldo , this is definitely a problem in terms of trying to use the Import Results Group job to recover files and I didn’t try to pursue this too hard as it would have been difficult (for me) to resolve.

I’ve imported the motion corrected micrographs from the recovered jobs to a new project to bypass directly importing the results group, then run CTF estimation and used csparc2star to convert my particle positions from my refinement .cs files into a star file. I’ve then imported particles directly from this output. The particles are linked to the source exposures by cutting the prefixes of both sources to match - from what I can see the import particle stack works successfully by ignoring the raw data.

The particle extraction process runs, but no matter what I try the particle positions seem to be incorrect. I would normally flip micrographs in the Y-axis before extracting, so I tried this initially followed by a flip in X, a flip in X and Y and no flip at all. Unfortunately, I noticed that the inspect particle picks doesn’t take the micrograph flip into account so I’m examining the initially displayed micrograph in the extraction log and reconstructing to see if it looks correct, which they don’t.

This leads me to believe that the problem is due to scaling. The data are collected in super-resolution and Fourier cropped (1/2) during motion correction so I don’t know if the .cs files are recording information according to their fully unbinned, pre-motion correction state or after this. The original cryoSPARC particles were also slightly Fourier cropped by a ratio of ~1.08 to speed up the processing, so I’m not sure if csparc2star is incorrectly scaling the micrograph coordinates by a factor of 0.5 or 2, 1.08 or 1/1.08 or a combination of these.

Thanks for asking questions, I appreciate this is quite an awkward situation and a little complicated to follow! I realised I can try to import into Relion so I’m going to see if there’s any chance of success that way as well.

If there’s a way to import particle positions directly from the existing cryoSPARC job while ignoring the lack of micrographs and extracted particles then I imagine I can link them together with the reassign particles job.

Gotcha. You could try using CryoSPARC tools to:

  1. Load in the original particles dataset and the new patch CTF job
  2. Replace the original particles’ location fields
  3. Save as external result and proceed with extraction, etc.

Something like this might work:

from cryosparc.tools import CryoSPARC
from cryosparc.dataset import Dataset
import json
from pathlib import Path

with open(Path("~/instance-info.json").expanduser(), "r") as f:
    instance_info = json.load(f)

cs = CryoSPARC(**instance_info)
assert cs.test_connection()

def strip_filename(filename: str) -> str:
    return str(Path(filename).name)[22:]

mics_to_particles = {
    "micrograph_blob/path": "location/micrograph_path",
    "uid": "location/micrograph_uid",
    "ctf/exp_group_id": "location/exp_group_id",
    "micrograph_blob/shape": "location/micrograph_shape",
    "micrograph_blob/psize_A": "location/micrograph_psize_A"
}

old_dataset = Dataset.load("/path/to/your/old/picked_particles.cs", prefixes=["location"])
old_dataset.add_fields([("filename", "O")])
old_dataset["filename"] = [strip_filename(f) for f in old_dataset["location/micrograph_path"]]

project_uid = "P345"
project = cs.find_project(project_uid)
patch_ctf_juid = "J120"
patch_ctf_job = project.find_job(patch_ctf_juid)
mics_dataset = patch_ctf_job.load_output("exposures")
mics_dataset.filter_fields(mics_to_particles.keys())
mics_dataset.add_fields([("filename", "O")])
mics_dataset["filename"] = [strip_filename(f) for f in mics_dataset["micrograph_blob/path"]]

old_dataset_by_mic = old_dataset.split_by("filename")
for filename, dataset in old_dataset_by_mic.items():
    mic_row = mics_dataset.query({"filename": filename})
    for m_fieldname, p_fieldname in mics_to_particles.items():
        dataset[p_fieldname] = mic_row[m_fieldname]
new_dataset = Dataset.append_many(*list(old_dataset_by_mic.values()))
new_dataset.drop_fields(["filename"])
project.save_external_result(
    # if you're not running v5 this line will fail.
    # enter your workspace UID manually.
    workspace_uid=patch_ctf_job.model.workspace_uids[0],
    dataset=new_dataset,
    type="particle",
    title="Re-assigned particles"
)