Error while 2D Classification

hsnyder · December 28, 2022, 10:11pm

Very interesting. So if I understand correctly, you have

topaz extract (job "A") → extract from mics (1/4 downsample) → 2D class → select 2D → ab initio (success)

Then you also have a chain that uses the particles from select 2D but ultimately fails.

Perhaps try this:
Clone the extract from mics (1/4 downsample) job, change the downsampling (to no downsample), clear the particles input, and connect your final particles from select 2D. Run the result through ab-initio and see what happens.

—Harris

swatib · December 28, 2022, 10:16pm

Hi @hsnyder,
So this is slightly complicated by the fact that I ran three topaz extract jobs and three corresponding extract from micrographs jobs, but then combined the particles after 2 rounds of 2D classification. But since all of those worked cloning any one of them and adding the micrograph inputs from the other topaz extract jobs should be fine? I’ll try this and let you know how it goes.

hsnyder · December 28, 2022, 10:24pm

Hi @swatib ,

Thanks for clarifying. Yes I think trying what you proposed is a good next step. It might also be worth trying extracting the three upstream topaz jobs seperately, and combining downstream. I have a strong suspicion that this problem has something to do with combining multiple inputs, though I’m not sure yet exactly what the cause is. Hopefully with some permutations of the experiment, we can nail down the exact step that causes this to happen…

–Harris

swatib · December 29, 2022, 6:59am

Hi @hsnyder,
I tried re-extracting using the cloned job, adding in the micrograph outputs from the other topaz extract jobs and and the particles from select 2D, and the ab initio job was giving the exact same error. Since it kept mentioning the same micrograph in the error, I decided to just do a curate exposures job linking both the particles and the micrographs, and remove that particular micrograph from the dataset. I also checked for duplicate filenames but I didn’t catch anything. This workaround worked, but I have no idea why this issue happened in the first place.

hsnyder · December 29, 2022, 3:22pm

Hi @swatib,

Well, I’m glad you found a solution! If you wouldn’t mind, could you run the following command on the offending particle stack (the one that gets printed out in the error message) and post the output?

xxd -e -g 4 /path/to/offending-particles.mrc | head

Edit: could you also post the output of

stat /path/to/offending-particles.mrc

Thanks,
Harris

swatib · December 29, 2022, 7:33pm

Hi @hsnyder,

Here are the outputs you requested :
/hd1/balakrishnan/P2/J2187/extract% xxd -E -g 4 FoilHole_11873160_Data_11857765_11857767_20221209_155826_fractions_patch_aligned_doseweighted_particles.mrc | head
0000000: 00020000 00020000 58000000 02000000 …
0000010: 00000000 00000000 00000000 00020000 …
0000020: 00020000 58000000 85ebd143 85ebd143 …e.J.e.J.
0000030: eb519042 0000b442 0000b442 0000b442 …
0000040: 01000000 02000000 03000000 9734f4c0 …p.4{
0000050: 9bca8e40 13c1e9b8 00000000 00000000 … .AZ…
0000060: 00000000 00000000 00000000 00000000 …
0000070: 00000000 00000000 00000000 00000000 …
0000080: 00000000 00000000 00000000 00000000 …
0000090: 00000000 00000000 00000000 00000000 …

/hd1/balakrishnan/P2/J2187/extract% stat FoilHole_11873160_Data_11857765_11857767_20221209_155826_fractions_patch_aligned_doseweighted_particles.mrc
File: ‘FoilHole_11873160_Data_11857765_11857767_20221209_155826_fractions_patch_aligned_doseweighted_particles.mrc’
Size: 92275712 Blocks: 180550 IO Block: 131072 regular file
Device: 28h/40d Inode: 1186324 Links: 1
Access: (0664/-rw-rw-r–) Uid: (11968/cryosparc) Gid: (10004/chazinlab)
Access: 2022-12-29 13:30:01.658480935 -0600
Modify: 2022-12-28 22:06:46.946534855 -0600
Change: 2022-12-28 22:06:46.946534855 -0600
Birth: -

Hope this helps!

hsnyder · December 30, 2022, 3:33pm

Hi @swatib,

That helps, thank you. There are 88 particles in the particle stack according to the mrc file header (88 is ‘58’ in hexadecimal, on the first line), but cryoSPARC was asking for the 89th frame (notice the ‘89, 90’ bit near the end of the RuntimeError message, meaning “fetch frames starting from the 89th, and stopping before the 90th”). Somehow, the dataset files (.cs files) have gotten out-of-sync with the particle stacks that are actually on disk. The associated .cs file must “think” there are more particles in that file than there actually are, which is why it’s asking for the 89th. The only time we had previously seen this is when there were micrographs with the same names, but it looks like something about your present workflow has produced the same kind of situation.

Harris

swatib · December 30, 2022, 5:23pm

Hi @hsnyder,

Oh I see! When I did the curate exposures job it did say it excluded 90 particles assigned to this micrograph. I’ve done the same workflow on other datasets and never had issues, but at least I know what to do if this happens again. Thanks for all your help!