AssertionError when merging datasets

Hi,

I have been combining particles from different datasets without problems.

I am now trying to re-process and repeat this but keep getting this error when trying to run a refinement with the combined particles from different datasets:

Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 84, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/hetero_refine/run.py", line 63, in cryosparc_compute.jobs.hetero_refine.run.run_hetero_refine
  File "/xxx/software/cryosparc/cryosparc_worker/cryosparc_compute/particles.py", line 39, in init
    assert n.all(n.abs(self.data['blob/psize_A'] - self.psize_input) < 1e-4 ), "All particles must have the same pixel size to within 10^-4 Angstroms"
AssertionError: All particles must have the same pixel size to within 10^-4 Angstroms

I have been able to do this before without problems, and the datasets have exactly the same pixel size, so I don’t know what is happening and why now I cannot process it again…

Any help would be greatly appreciated!

Hi @LTP,

It’s possible due to floating point error that the pixel sizes are slightly different, particularly if any fourier cropping or downsampling was done on each dataset. Could you check the pixel size of each of the distinct particle datasets in the .cs file directly?

You can do this through the interactive client. First, navigate to the job directory of the extraction jobs and record the paths to the .cs file of the particle stacks for each extracted set (this should end in /extracted_particles.cs if they were extracted in cryoSPARC). In a terminal connected to the master node:

> cryosparcm icli
> import numpy as n
> from cryosparc_compute import dataset
> # Load in the dataset for each particle stack
> dset1 = dataset.Dataset().from_file("<path_to_cs_file_particle_set_1>")
> dset2 = dataset.Dataset().from_file("<path_to_cs_file_particle_set_2>")
> # Load array of pixel sizes for all particles
> psize1 = dset1.data['blob/psize_A']
> psize2 = dset2.data['blob/psize_A']
> # Check if they are all equal
> print(psize1[0], psize2[0]) # are they equal?
> print( n.allclose(psize1[0], psize1) ) # is this False?
> print( n.allclose(psize1[0], psize2) ) # is this False?

If the two pixel sizes are different by > 0.0001, then the error would be expected. If the difference is really small, you could override the pixel size of both particle sets and then re-import them back into cryoSPARC – let me know if this is the case and I can provide instructions on how to do this.

Best,
Michael

Hi,

Thanks for your help. I’m trying to run the command but get the following error:

NameError Traceback (most recent call last)
in
8 # Check if they are all equal
9 print(psize1[0], psize2[0]) # are they equal?
—> 10 print( n.allclose(psize1[0], psize1) ) # is this False?
11 print( n.allclose(psize1[0], psize2) ) # is this False?

NameError: name ‘n’ is not defined

I’m replacing “<path_to_cs_file_particle_set_1>” for “my_path” but nothing else. Is there something else I need to change?

Dear @LTP,

So sorry about that, my mistake! Add the line import numpy as n after entering the ipython shell, and the allclose function should work. I updated the original comment to be corrected. Other than that, yes just replace the paths surrounded by angled brackets (keep the quotations).

Best,
Michael

Great, that worked!

I might try to re-process one with the correct pixel size (2.486 instead of 2.4) but can you please show me how I could override the pixel size if I’d wanted? From previous threads I understand that changing the pixel size is tricky but is there a trick to do it?

Thanks a lot for your help!

Hi @LTP,

The difference between the two pixel sizes is significant, and unfortunately combining datasets of different pixel sizes isn’t easily handled in cryoSPARC right now. Ali posted an insightful comment on another thread about the practical utility of merging dataset of different pixel sizes, which may be of interest. More formal support of this workflow in cryoSPARC is coming when magnification anisotropy is released, but in the meantime, the only other way to do this is quite cumbersome (and in practice may not result in much of a benefit – see Ali’s comment).

If you believe that it is worth trying to do in your case, the long answer is that you will need to follow a very similar procedure as the one outlined on this CCPEM forum post by Max Wilkinson. (See also the RELION wiki page on pixel size issues). First you would have to find exactly the relative pixel size of one dataset to the other, which Max Wilkinson suggests using Chimera “fit in map” command for. You’d have to use the script linked in the forum post to find a ratio of two even box sizes that is very nearly equal to the ratio between the two pixel sizes. Once that is done, you would need to re-extract particles from the larger pixel size dataset at the smaller box size, and re-extract particles from the smaller pixel size dataset at the larger box size. Then, you’ll need to use “Downsample Particles” to downsample the larger box particles to match the smaller box size.

In case the pixel sizes are still different by > 10^-4, you’ll need to modify the .cs files of the particle sets to override the pixel size of each set to be exactly equal, and then save and re-import the modified particle stack using “Import Result Group”. It’s fine to do this only if the error is very small. Stephan made an excellent tutorial explaining how to do this for a different case of modifying the shifts in a dataset.

Best,
Michael