2D Classification: ValueError: index is out of bounds for array

I manually picked 200 particles and tried to run a 2D classification job, but the job fails before even a single iteration is completed. All parameters are set to their default, other than the request to classify into 10 classes. Below is the output:

As similar index is out of bounds GPU-related errors have been reported in other posts, I will state that the workstation being used is running CentOS7. Below is the ouput of nvidia-smi:

image

I also checked the output of cryosparcm joblog for the failed job, which is not very informative:

image

Two interesting observations:

  1. In the same project where this failed job occurred, I ran two other 2D classification jobs using the output of blob picker (containing several hundred thousand particles). Both jobs completed successfully, although some runtime warnings were present in the joblog.

  2. I tried to run this same 2D classification job multiple times, and the job does not always fail before the first iteration. In most cases it fails after iteration 1 with the above error, but in one instance it processed up to iteration 11 before failing with the above error.

Any advice on resolving this issue would be appreciated. Both Master and worker are running v3.2.0+210413.

Hi @mchakra,

Thanks for reporting and providing all this information. We’d like to try to reproduce this ourselves on our systems- do you think you can share the 200 particles (based on the screenshot, they’re scattered across 23 MRC files) and their corresponding .cs file? You can get them by going to the Manual Picker job, navigating to the “Output” tab, and clicking on “Export” under the particles output result group. This will create a new folder inside the project folder with the particle MRC files, the micrograph MRC files and their corresponding .cs and .csg files. You’ll find the full path at the bottom of the “Overview” tab.
If you’re okay to share your data, let me know, and I’ll send you some details so you can get those over to me.

Hi @stephan,

Thank you for responding and offering to look into this. I think it should be find to share the data with you, as long as it is used only for error reproduction purposes. I was able to export the data into the folder as you described. Please let me know how I should send them to you.

HI @mchakra,

Definitely, all data you choose to share with us will be kept confidential and used only for the purposes of reproducing this error in order to create a bug fix. I’ll send you a message with the credentials to our server to which you can SCP files to.

I see the same error occasionally on my CentOS system. Just saw it on a Class2D job now, which died in iteration 4. The error in both the GUI log and the joblog look similar:

[CPU: 5.12 GB]   Traceback (most recent call last):
  File "/home/exx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1791, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 1108, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 389, in cryosparc_compute.engine.engine.EngineThread.find_and_set_best_pose_shift
  File "<__array_function__ internals>", line 6, in unravel_index
ValueError: index 1059285798 is out of bounds for array with size 336

If I rerun the same job, even with different parameters (e.g smaller mask), it dies at the same iteration:

[CPU: 5.24 GB]   Traceback (most recent call last):
  File "/home/exx/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 1791, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 1108, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_worker/cryosparc_compute/engine/engine.py", line 389, in cryosparc_compute.engine.engine.EngineThread.find_and_set_best_pose_shift
  File "<__array_function__ internals>", line 6, in unravel_index
ValueError: index -1087200183 is out of bounds for array with size 336

Perhaps notably, earlier in the log I see this:

[CPU: 2.66 GB]   Iteration 4
[CPU: 2.66 GB]     -- Effective number of classes per image: min nan | 25-pct nan | median nan | 75-pct nan | max nan 
[CPU: 2.66 GB]     -- Probability of best class per image: min nan | 25-pct nan | median nan | 75-pct nan | max nan 

we are still seeing this error intermittently - @stephan, can we provide any useful data to help fix?

Hey @olibclarke,

Are you on the latest patch on v3.2.0? We fixed a bug caused by a similar reason, I wonder if it will help you here as well

I’m on the next to latest… ok will update to the very latest and see if it fixes the issue, thx!

Hey @olibclarke, @mchakra,

Have you experienced this issue while on the latest patch by any chance?

We are not on the latest patch, but currently using v3.2.0+210713, and do not seem to be encountering this issue.

Hi All,
I also encountered the same error during 2D classification (v3.2.0), which didn’t appear before (I am doing 2D classification on a dataset that used to work fine before). Any ideas what is the cause for this?
Best,
Daniel