Out of bounds error, 4.2.1CS, 2D, 2080TI

Hi CS community,

we are getting a repeated out of bounds error in multiple different jobs, but even as early as right after extraction in the first 2D classification. This is on 4x2080TI machine. CS version is 4.2.1. Check Particles job seems to run fine on this stack (2.2M particles, 224 pixel box). From the beginning there are bunch of nan error in the header which cannot be good:
Iteration 0
[CPU: 3.23 GB Avail: 247.94 GB]
– Effective number of classes per image: min nan | 25-pct nan | median nan | 75-pct nan | max nan
[CPU: 3.23 GB Avail: 247.94 GB]
– Probability of best class per image: min nan | 25-pct nan | median nan | 75-pct nan | max nan

Here is the exact ultimate error output:

Traceback (most recent call last):
File “/home/ucsf/software/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 2061, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 1101, in cryosparc_compute.engine.engine.process.work
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 390, in cryosparc_compute.engine.engine.EngineThread.find_and_set_best_pose_shift
File “<array_function internals>”, line 5, in unravel_index
ValueError: index -1062104978 is out of bounds for array with size 336

Any ideas? Is this a normalization error during extraction? Or is the gaming GPU extracting with errors?



Welcome to the forum @mulik52.
Users with similar error messages also reported

Please see 2D classification fails! - #4 by wtempel for suggested tests.

Great, thanks for the reply.

To close the loop, it was indeed a bad ram chip, running stressapptest showed a ton of errors within the first minute. Interestingly, after finding and removing the bad RAM and running the “Check Particles” job (on the particle stack extracted with the faulty RAM), the 2D classification would run but would yield very strange looking circular patterns. So, clearly there were some errors which the job could not catch. Re-extracting the same stack with good RAM seems to make everything work.



1 Like