Hi I am having a dataset with about 2 million particles. Thus I split the dataset with the particles set tools. Now when I run 2D classification on any subset. I get at a random iteration consistently the following error:
Traceback (most recent call last):
File "cryosparc2_compute/jobs/runcommon.py", line 1685, in run_with_except_hook
run_old(*args, **kw)
File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 110, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File "cryosparc2_worker/cryosparc2_compute/engine/cuda_core.py", line 111, in cryosparc2_compute.engine.cuda_core.GPUThread.run
File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 991, in cryosparc2_compute.engine.engine.process.work
File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 90, in cryosparc2_compute.engine.engine.EngineThread.load_image_data_gpu
File "cryosparc2_compute/particles.py", line 107, in get_original_real_data
return self.blob.view().copy()
File "cryosparc2_compute/blobio/mrc.py", line 102, in view
return self.get()
File "cryosparc2_compute/blobio/mrc.py", line 99, in get
data = n.fromfile(file_obj, dtype=self.dtype, count= n.prod(self.shape)).reshape(self.shape)
ValueError: total size of new array must be unchanged
We are trying to investigate this in detail (it’s been reported several times) but we’ve never reproduced it ourselves. Can you give the following details:
What was the exact workflow that created the particles?
What kind of filesystem is the project directory stored on?
Is all processing happening on the same node or do you have multiple worker nodes?
How many micrographs were there originally?
What was the box size and pixel size of the particle images?
Would you be able to share one of the offending particle stacks with us so we can figure out the problem? I will provide secure remote upload credentials.
I ran patch motion, patch ctf, manual picking, 2D classification, template picking, inspect picks, patch ctf extract, and finally particle extraction. Then I did particle sets of 200000 particles but this doesnt seem to be the problem. During extraction I used fourier windowing as the original data was collected in super resolution mode.
its a BeeGfs
we have several workstations and a cluster as worker nodes. The error is reproduceable on all our workers.
5160 micrographs
I extracted with a box size of 600 but Fourier cropped it to 300
@apunjani I have the same error now also with a patch motion correction of another dataset:
[CPU: 684.9 MB] Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1685, in run_with_except_hook
run_old(*args, **kw)
File “/users/svc_cryosparc/software/regular/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “cryosparc2_compute/jobs/pipeline.py”, line 153, in thread_work
work = processor.process(item)
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 121, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 126, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
File “cryosparc2_compute/blobio/mrc.py”, line 115, in read_mrc
data = read_mrc_data(file_obj, header, start_page, end_page, out)
File “cryosparc2_compute/blobio/mrc.py”, line 78, in read_mrc_data
data = n.fromfile(file_obj, dtype=dtype, count= num_pages * ny * nx).reshape(num_pages, ny, nx)
ValueError: total size of new array must be unchanged
Thanks for the details about the particle stack problem - I will provide upload instructions in a few mins.
For the patch motion problem - does the surrounding log lines indicate which movie was being processed? In this case the error is saying that the movie file (which was in .mrc format, correct?) is the wrong length on disk. This could be a bug, but could also happen if you have a corrupt movie file (truncated), or if in your import job you accidentally included the gain reference micrograph as a movie. Can you check those possibilities? We have not seen many reports of this problem before, unlike the particlestack ValueError
The ValueError when reading particles in 2D classification was due to having imported micrographs that contained some duplicate files of the same names
The ValueError when reading movies during patch motion was caused by a single corrupt .mrc movie file in the dataset