AssertionError of running Ab-Initio Reconstruction

Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 84, in cryosparc_compute.run.main
File “cryosparc_worker/cryosparc_compute/jobs/abinit/run.py”, line 222, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/home/amax/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py”, line 453, in align_density
assert n.all(n.isfinite(M))
AssertionError

Thank you! Traceback is attached.

The version is v3.2 to run this job.

Hi @Layman_XUE,

Can you also report for us your OS version, NVIDIA driver version, and CUDA version?
We have reports of this issue sometimes on CentOS7 (which seems to have a lot of other problems with CUDA programs)

Thank you!
My OS is centos7,
NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 10.0

I can report the same problem as well:

assert n.all(n.isfinite(M))
AssertionError

Also having this issue with v3.2 on CentOS7 w/ NVIDIA driver version 460.32.03 and CUDA version 11.2.

Thanks,
Nathanael

Dear all,
Is it possible to try running the job on a non-CentOS 7 machine?

Sadly, I’m only running on CentOS7 here - if it helps, I reverted to 3.1.0 and the same job has run just fine.

We are running CentOS 7 and do not see this error on v3.2 (original, haven’t yet applied latest patch), NVIDIA driver version 460.67 with CUDA 11.2

As an update - I updated my NVIDIA driver to 460.73.01 and was still having the error. Tested removing the “cuMemHostAlloc failed” workaround from the worker config.sh file and this seems to have resolved the error - though the intermittent “cuMemHostAlloc failed” error persists in the absence of the workaround. @Layman_XUE @Navid

I see the same. The Ab initio works fine when I remove the export CRYOSPARC_NO_PAGELOCK=true from my config.sh

It seems to be directly linked to that command. It may appear to be CentOS related, as only people running CentOS will add this command from reading the v3.2 changelog.

Hello, we are also seeing this error (specifically on a data set that was fixed following seeing the errors mentioned in this thread Error while 2D Classification. I removed the PAGELOCK setting from config.sh and the job seems to run now. We are also running v3.2 with the most recent patch on a Centos 7 machine.

Best,
Justas

2 Likes

We are also seeing this error now (after not seeing it previously) after adding the CRYOSPARC_NO_PAGELOCK=true line to config.sh

(unfortunately we then see the cuMemHostAlloc error again)

@jr10 @olibclarke are you running the latest patch v3.2.0+210615? This should fix these errors in Ab-Initio. Please also also keep export CRYOSPARC_NO_PAGELOCK=true in cryosparc_worker/config.sh

I also get this error in Ab-initio.

Running on Ubuntu 20.04 LTS, Nvidia Driver version 510.54, cuda 11.5. Nvidia A40 GPU. Cryosparc version 3.3.1
I did not export CRYOSPARC_NO_PAGELOCK=true, is that a valid solution?
Also something in general seem to not have worked in that ab-inito since 4/5 classes have 0.0% and i just get 5 balls as volumes.

[CPU: 5.85 GB]     Done iteration 01421 of 04745 in 12.955s. Total time 18243.6s. Est time remaining 44098.6s.

[CPU: 5.98 GB]   ----------- Iteration  1422 (epoch 0.387).  radwn 67.91  resolution 12.00A  minisize  300  beta 0.00 

[CPU: 5.85 GB]      -- Class  0 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  1.000 S: 11.984 Class Size: 0.0% (Average: 20.8%)

[CPU: 5.85 GB]      -- Class  1 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio :   nan ESS R:  0.999 S: 11.988 Class Size: 100.0% (Average: 19.3%)

[CPU: 5.85 GB]      -- Class  2 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  0.999 S: 11.985 Class Size: 0.0% (Average: 20.7%)

[CPU: 5.85 GB]      -- Class  3 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  0.998 S: 11.982 Class Size: 0.0% (Average: 20.5%)

[CPU: 5.85 GB]      -- Class  4 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  1.002 S: 11.987 Class Size: 0.0% (Average: 18.6%)

[CPU: 5.86 GB]     Done iteration 01422 of 04745 in 12.987s. Total time 18256.5s. Est time remaining 43991.2s.

Error:

[CPU: 3.31 GB]   Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/abinit/run.py", line 276, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
  File "/home/cryosparcuser/cryosparc_worker/cryosparc_compute/noise_model.py", line 118, in get_noise_estimate
    assert n.all(n.isfinite(ret))
AssertionError
1 Like

Hi
I am getting the same error when running ab initios, though related to the worker, and the run stops early with this message:
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 230, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/localapps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py”, line 460, in align_density
assert n.all(n.isfinite(M))
AssertionError

The version I am using is v4.4.0, and I am running everything in a Ubuntu 20.04.6 LTS, processor Intel® Xeon(R) W-2275 CPU @ 3.30GHz × 28 with NVIDIA Corporation TU106 [GeForce RTX 2070] (TURBO RTX 2070) (2 GPU), CUDA version 11.8

Can you help me? Cheers!

Welcome to the forum @midauden.
We are unsure about the cause. It could be faulty RAM. You may want to try suggestions we linked in Out of bounds error, 4.2.1CS, 2D, 2080TI - #2 by wtempel.

I encountered the same problem recently, how did you solve it?

Hi!

I want to report similar error when running Ab-Initio job for my negative staining dataset, CryoSPARC [v4.4.1]: