Find the Correlation between two star files

Ashwin-Dhakal · August 16, 2023, 10:46pm

Is it possible to compare two star files in Cryosparc, which correspond to the same set of micrographs? To elaborate, consider a scenario where we have 1000 micrographs associated with a specific EMPIAR ID. Assuming we have a “ground_truth.star” file containing the actual particle coordinates, and a “predicted.star” file containing coordinates predicted by our algorithm, is it possible to compute metrics such as precision, recall, f1 score, and other relevant measures between these two star files?

While I understand that we can import the star files to extract particles and assess 3D resolution for comparison purposes, I’m seeking additional metrics that can help us evaluate the accuracy of predictions based on the ground truth star file.

I greatly appreciate your assistance. Thank you!

rajangyawali · August 17, 2023, 5:04pm

@admin @admins I am also looking for the similar comparison.

DanielAsarnow · August 17, 2023, 6:09pm

There are good libraries to bring your star files in as pandas DataFrames for your data science applications. One is pyem.star another is starfile from #teamtomo.

import starfile
star1 = starfile.read("particles.star")
star2 = starfile.read("test.star")
x1 = star1['data_particles'][['rlnCoordinateX, 'rlnCoordinateY']]
x2 = star2['data_particles'][['rlnCoordinateX, 'rlnCoordinateY']]
pairwise = x1.dot(x2.T)
# etc...

You will need to think about how to calculate useful measures with nonidentical particle sets, for example by considering each particle in the test set to be a true positive if its nearest neighbor in the ground truth is within a certain distance.

You can check out the duplicate particle removal code from pyem.star for an example of doing this performantly (with a spatial tree).

rajangyawali · August 20, 2023, 12:07pm

@DanielAsarnow Thank you so much for providing meaningful insights.