EMPIAR deposition

Does anyone have a good workflow for depositing in the EMPIAR database. Which meta data file would you upload along with a particle stack? Is it best to convert the .cs file to .star? Or can you export a .json file?

1 Like

To give some more background, my colleague tries to prepare an EMPIAR deposition. He wants to deposit the raw movies along with the particle stack from the final 3D refinement job. What he tried was to “Export” the particles from the “Output” tab in the 3D homogeneous job. But then he cannot import the particles again. The error is that CryoSPARC cannot find the motion corrected micrographs. His work-around now is to use pyem and convert the particles.cs file to a star file.

Should it not be possible to import an exported particle stack first and then connect the particles to exposures later?

@kstachowski (thanks again!) provided a guide for me to disconnect stacks when I was trying to share some data with Structura. The relevant parts for this question might be:

  • Create a “Particle sets tool” job
  • Connect the particle dataset to the particles A input
  • Access the low-level inputs by clicking on the drop down underneath the particles A input, and Remove the location field by clicking the “x” in the top right corner of the location slot (the location slot should read empty). If there is no connected location slot, you do not have to do this step.
  • Set the Split batch size parameter to the total number of particles in the dataset
  • Run the job
  • Upon completion, navigate to the outputs tab and click on “Export” under the “Split 0”

Then tar up the resulting export directory…?

2 Likes

I have deposited to EMPIAR before, but never from a CryoSPARC project. From what I remember, the particle coordinates needed to be provided as a star file, so you might have to convert them either way.

I think the bare minimum for a deposition is movies, gain reference file (if not already applied to the movie frames), and particle coordinates from the final reconstructions. Providing particle stacks is nice and helpful to people who want to quickly use the particle images, but not strictly necessary since they can be regenerated from movies and coordinates. So, if this simplifies your export a lot, I would say it’s fair to only provide coordinates.

Please post an update here if you come up with a protocol that is easy to follow to export files for deposition. :pray:

1 Like

Most EMPIAR depositions do not contain final locations? Some do, but most do not. Just raw movies (and gain if required, although even that isn’t guaranteed, although I forget which dataset it was which is missing the gain but needs it…) and some basic info.

I know most entries only contain movies. Nothing is mandatory, other than certain metadata, and it is a good thing in a way because it lowers the difficulty to deposit.

I simply listed what I consider the bare minimum to ensure some degree of replicability. Not sharing the particle coordinates prevents others from recomputing the same reconstruction, which can be necessary depending on why people use the deposited data.

2 Likes

Apologies, I misunderstood what you were implying, i.e. that you thought the bare minimum permitted was XYZ, rather than in your opinion the minimum for reproducibility should be XYZ. :slight_smile:

No worries, I agree it could have been understood both ways, which is why I clarified in the next message.

Did you manage to get a coordinate format accepted by EMPIAR by following the procedure you describe? Or was it only to re-import in CryoSPARC?

While I’ve got a few datasets I’m in the process of uploading, I’ve not uploaded particle stacks to EMPIAR that way. I’ve had a lot of “fun” recently with lab members wanting to export particle stacks and struggling with exporting them consistently (including myself) so thought at least the working stack-output method from Kye would be a starting point. :slight_smile:

One dataset I want to upload included particle positions from picking, plus final picks and angles in both RELION and CryoSPARC, but I was having some other issues getting the dataset uploaded…

1 Like

Hi @daniel.s.d.larsson @Guillaume @rbs_sci,

I messed around with this for a bit and came up with a workflow that with my testing appears to work.

  1. Use Restack Particles to create a clean, compact version of the particles.
  2. Make a folder to place the relevant particles and their metadata into for deposition.
  3. Copy the following files from the Restack Particles job directory to the deposition folder that was just created:
    • restack directory (this includes all the batches of particles in .mrc format)
    • J1234_particles.csg
    • J1234_passthrough_particles.cs
    • restacked_particles.cs
  4. Tar this folder and include with deposition.

For a user to download and import these, they would need to:

  1. Download this tarball and untar into their respective project’s import directory.
  2. Select the job type Import Result Group and select the .csg file located in the untarred folder.
  3. Additionally, if re-extraction of the particles is desired, connecting the pre-processed mics and this imported result group (particles) to a Reassign Particles job will associate the particles to the pre-processed mics.

All subsequent processing can be performed, including re-extraction of the particles from the pre-processed mics and/or direct refinement of the imported particles. This workflow will retain alignments3D, ctf, and location info. Location info includes many fields but the most important include the particles location on the mic (location/center_(X/Y)_frac)and the mic that particle originated (location/micrograph_path).

I have never deposited particles, so maybe the compression isn’t needed, but having everything in a single folder is easier and cleaner when copying it over to your projects import folder.

Let me know if you run into any issues!

All the best,
Kye

PS: If you convert your particle.cs file to a .star file, the CTF info will be lost if you performed any sort of Global/Local CTF refinements.

PPS: I have not checked with EMPIAR requirements about file formats etc, but this will get you to a point where there is a particle stack that can be deposited.

5 Likes

For posterity, this workflow is a way for users to share a particle stack when the micrographs are not needed/available. This would likely not be useful in the case of deposition as the location and CTF info are nice to have.

2 Likes

Thanks @kstachowski I will forward the protocol to my colleague.

How about if you have a consensus reconstruction that includes all the particles and then also done classification. Is there an easy way to deposit a single particle stack with multiple meta data files pointing to the original particle stack.

Thanks!
I tried following the steps. Unfortunately the same issue arises when importing the particles. It fails because it cannot find the original motion-corrected micrographs, which are specified in the passthrough file.
Is there a way around this? or did I miss something? I always delete the motion corrected micrographs once I am done with them. I would also not include them in the deposition but I want to include the particle stacks…

Thanks!

Hi @AdrianGL,

You are correct, an oversight on my part. I will provide updated instructions once we find a workable solution that alleviates this problem.

Best,
Kye

Thank you, that would be great!

Hi all,

After some discussion with the team, it would make the most sense to convert your particle.cs file to a .star file using pyem and deposit in that manner. This will preserve particle locations, but not CTF information.

We do aim to make the deposition process easier in a future release though!

Thanks,
Kye

2 Likes

Thank you. Looking forward to that!