I am working with a dataset that requires me to move from Cryosparc to Relion and vice versa. All my particle picking and 2D classes are done with Cryosparc. I used csparc2star.py to transfer the particles and generate associated micrographs (from Cryosparc). I have transferred 4M particles and when I do 3D classification, iterations take 32h for each to complete. 2D classification seems normal though, however it displays a warning:
WARNING: It appears that these images have not been normalised to an average background value of 0 and a stddev value of 1.
Note that the average and stddev values for the background are calculated:
(1) for single particles: outside a circle with the particle diameter
(2) for helical segments: outside a cylinder (tube) with the helical tube diameter
You can use the relion_preprocess program to normalise your images
If you are sure you have normalised the images correctly (also see the RELION Wiki), you can switch off this warning message using the --dont_check_norm command line option
Then if I re-extract the particles in Relion, the same 3D classification iterations take around 1h each. Interestingly,I have noticed that if I re-extract Cryosparc particles in a Relion environment the quality of the 2D classes decreases, compared to the original stack created by csparc2star and imported into Relion. Any thoughts? Should I run relion_preprocess as suggested?
If you use the --dont_check_norm flag, are iterations much faster?
Regarding lower quality 2D classes for relion extracted particles compared to those from CS, have you ensured the particle pick locations transferred correctly? Sometimes you must use --swapxy (if particles were originally extracted in CS) and/or use --inverty.
Currently, --swapxy is never required and has been made a no-op, but --inverty is needed if you will re-import coordinates to cryoSPARC or if you have full cryoSPARC pipeline coords and you want to use them to extract in Relion on new Relion MotionCorr micrographs. You can check if they’re correct with disparticle.py or relion_display.
The way to reason out what you need is to remember that cryoSPARC imports coordinates from Relion correctly. So if you have a micrograph (in or out of CS) and coordinates in Relion, then import them, then export, you will need --inverty (in order to get the same exact values again). OTOH if you have coords picked in CS, to be used in Relion on the same micrograph file, then you don’t want any argument.
Since Relion/Motioncor2 flip the movie sum, if you are changing the actual micrograph file to one from a different motion correction, with those coords from cryoSPARC, then once again you want --inverty.
@user123 is right, check if your coordinates are correct and also if the normalization actually matters (very rare). Also consider if you have used a different box size or something else is different as well (SSD caching?).
Thanks for the comments. The way I transfer the particles is the one explained for csparc2star. I create a link for particles from a refinement in Cryosparc. Also, I generate micrographs.star using the ones from Patch CTF in Cryosparc. With this, I am able to move back and forth with Cryosparc and Relion. The particle picks are the same and the 2D classes are the same. The difference is when I re-extract the particles in Relion using the Patch CTF micrographs, the same particles show 2D classes with poorer quality. While I don’t mind much about that since 2D classification is done in Cryosparc, I wonder if my 3D classifications in Relion are missing relevant information. The reason to think that is the fact that Cryosparc imported particles from csparc2star need 10 times to process each iteration that those re-extracted using csparc2star coordinates, and the quality of 2D classes, more featureless. I just wanted to know if someone has observed such behavior and why this could be happening. For the micrographs I generate a folder with a soft link (ln -s…) to the .mrc files from Patch CTF. I have been moving small batches of particles (500K or less) with no issues.
The coordinates are correct and Relion picks the same particle locations I have in Cryosparc by inspecting extractpicks.star. Smaller batches of particles did not give any issue with 3D classifications. Iteration times looked sensible. I am wondering if soft-linking the particles creates some communication delay that can explain the longer iterations in 3D. That or the excessive information Relion has to interpret which is resolved upon re-extraction in the Relion environment.
OK, thanks for checking the coordinates carefully. That is always the no. 1 suspect in cases like this.
Using symbolic links won’t add any significant delay, but Relion iteration times are very dependent on how “easy” it is to align the particles, because the number of significant samples for marginalization has a directly proportional impact on the amount of work done and memory used.
One possibility is that in this case, the normalization really is critical. All it does is convert the intensities within the particle diameter to Z-scores based on the presumed noise background in the corners of the images. Intensity scale changes don’t have that large an impact on relative cross-correlations, that’s why it usually doesn’t matter very much. If you don’t mind giving that a test by using relion_image_handler to directly normalize the cryoSPARC particles, we can determine if that’s really the case here (likewise --dont-check-norm, which just eliminates the checks). Having a rubric to understand when the normalization is important would be very useful!
A second possibility is something about the parameters to Relion, perhaps the default value of T is not ideal, or a strict E-step limit resolution (e.g. 8-12A) is needed to get similar quality to cryoSPARC. Another is the recentering of the particles on re-extraction, though I’m not sure why that would have opposite effects on 2D and 3D classification.
However, as you mentioned smaller batches being OK, I wonder if it’s actually a file I/O issue. Do you use a SSD cache with Relion? If so, then Relion is does a restacking of just the particles actually needed in order to cache the data, but if not the the original stacks might span a lot more disk than ones from a re-extraction.