ValueError: total size of new array must be unchanged in 2D classification after splitting

david.haselbach · February 26, 2020, 1:23pm

Hi I am having a dataset with about 2 million particles. Thus I split the dataset with the particles set tools. Now when I run 2D classification on any subset. I get at a random iteration consistently the following error:

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1685, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run
  File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 111, in cryosparc2_compute.engine.cuda_core.GPUThread.run
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 991, in cryosparc2_compute.engine.engine.process.work
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 90, in cryosparc2_compute.engine.engine.EngineThread.load_image_data_gpu
  File "cryosparc2_compute/particles.py", line 107, in get_original_real_data
    return self.blob.view().copy()
  File "cryosparc2_compute/blobio/mrc.py", line 102, in view
    return self.get()
  File "cryosparc2_compute/blobio/mrc.py", line 99, in get
    data = n.fromfile(file_obj, dtype=self.dtype, count= n.prod(self.shape)).reshape(self.shape)
ValueError: total size of new array must be unchanged

Anything that can be done about that?

Best,

David

david.haselbach · February 26, 2020, 3:24pm

update: same error occurs while 2D classifying the full dataset. So it might not be the splitting causing this

Best,

David

apunjani · February 26, 2020, 11:47pm

Hi @david.haselbach,

We are trying to investigate this in detail (it’s been reported several times) but we’ve never reproduced it ourselves. Can you give the following details:

What was the exact workflow that created the particles?
What kind of filesystem is the project directory stored on?
Is all processing happening on the same node or do you have multiple worker nodes?
How many micrographs were there originally?
What was the box size and pixel size of the particle images?
Would you be able to share one of the offending particle stacks with us so we can figure out the problem? I will provide secure remote upload credentials.

david.haselbach · February 27, 2020, 6:38am

Hi,

I ran patch motion, patch ctf, manual picking, 2D classification, template picking, inspect picks, patch ctf extract, and finally particle extraction. Then I did particle sets of 200000 particles but this doesnt seem to be the problem. During extraction I used fourier windowing as the original data was collected in super resolution mode.
its a BeeGfs
we have several workstations and a cluster as worker nodes. The error is reproduceable on all our workers.
5160 micrographs
I extracted with a box size of 600 but Fourier cropped it to 300
absolutely, how do I do that best?

Best,

David

david.haselbach · February 28, 2020, 6:16am

@apunjani I have the same error now also with a patch motion correction of another dataset:
[CPU: 684.9 MB] Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1685, in run_with_except_hook
run_old(*args, **kw)
File “/users/svc_cryosparc/software/regular/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “cryosparc2_compute/jobs/pipeline.py”, line 153, in thread_work
work = processor.process(item)
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 121, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 126, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_compute/blobio/mrc.py”, line 115, in read_mrc
data = read_mrc_data(file_obj, header, start_page, end_page, out)
File “cryosparc2_compute/blobio/mrc.py”, line 78, in read_mrc_data
data = n.fromfile(file_obj, dtype=dtype, count= num_pages * ny * nx).reshape(num_pages, ny, nx)
ValueError: total size of new array must be unchanged

apunjani · February 28, 2020, 3:23pm

Hi @david.haselbach,

Thanks for the details about the particle stack problem - I will provide upload instructions in a few mins.

For the patch motion problem - does the surrounding log lines indicate which movie was being processed? In this case the error is saying that the movie file (which was in .mrc format, correct?) is the wrong length on disk. This could be a bug, but could also happen if you have a corrupt movie file (truncated), or if in your import job you accidentally included the gain reference micrograph as a movie. Can you check those possibilities? We have not seen many reports of this problem before, unlike the particlestack ValueError

apunjani · February 28, 2020, 4:34pm

Note for any users reading this:

The ValueError when reading particles in 2D classification was due to having imported micrographs that contained some duplicate files of the same names
The ValueError when reading movies during patch motion was caused by a single corrupt .mrc movie file in the dataset

stephan · May 15, 2020, 6:02pm

Hi Everyone,

One potential cause for this error is a bug in cryoSPARC, and it has been fixed as of v2.15.0: