AssertionError of running Ab-Initio Reconstruction

Hi,
I found the below error when I ran Ab-Initio Reconstruction.
assert n.all(n.isfinite(M))
AssertionError

Screenshot from 2021-04-01 19-31-51|690x129

2 Likes

Hi @Layman_XUE, can you please let us know which version of cryoSPARC you are running, and please copy-paste the text of the traceback. Thank you!

Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 84, in cryosparc_compute.run.main
File “cryosparc_worker/cryosparc_compute/jobs/abinit/run.py”, line 222, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/home/amax/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py”, line 453, in align_density
assert n.all(n.isfinite(M))
AssertionError

Thank you! Traceback is attached.

The version is v3.2 to run this job.

Hi @Layman_XUE,

Can you also report for us your OS version, NVIDIA driver version, and CUDA version?
We have reports of this issue sometimes on CentOS7 (which seems to have a lot of other problems with CUDA programs)

Thank you!
My OS is centos7,
NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 10.0

I can report the same problem as well:

assert n.all(n.isfinite(M))
AssertionError

Also having this issue with v3.2 on CentOS7 w/ NVIDIA driver version 460.32.03 and CUDA version 11.2.

Thanks,
Nathanael

Dear all,
Is it possible to try running the job on a non-CentOS 7 machine?

Sadly, I’m only running on CentOS7 here - if it helps, I reverted to 3.1.0 and the same job has run just fine.

We are running CentOS 7 and do not see this error on v3.2 (original, haven’t yet applied latest patch), NVIDIA driver version 460.67 with CUDA 11.2

As an update - I updated my NVIDIA driver to 460.73.01 and was still having the error. Tested removing the “cuMemHostAlloc failed” workaround from the worker config.sh file and this seems to have resolved the error - though the intermittent “cuMemHostAlloc failed” error persists in the absence of the workaround. @Layman_XUE @Navid

I see the same. The Ab initio works fine when I remove the export CRYOSPARC_NO_PAGELOCK=true from my config.sh

It seems to be directly linked to that command. It may appear to be CentOS related, as only people running CentOS will add this command from reading the v3.2 changelog.

Hello, we are also seeing this error (specifically on a data set that was fixed following seeing the errors mentioned in this thread Error while 2D Classification. I removed the PAGELOCK setting from config.sh and the job seems to run now. We are also running v3.2 with the most recent patch on a Centos 7 machine.

Best,
Justas

1 Like

We are also seeing this error now (after not seeing it previously) after adding the CRYOSPARC_NO_PAGELOCK=true line to config.sh

(unfortunately we then see the cuMemHostAlloc error again)

@jr10 @olibclarke are you running the latest patch v3.2.0+210615? This should fix these errors in Ab-Initio. Please also also keep export CRYOSPARC_NO_PAGELOCK=true in cryosparc_worker/config.sh

I also get this error in Ab-initio.

Running on Ubuntu 20.04 LTS, Nvidia Driver version 510.54, cuda 11.5. Nvidia A40 GPU. Cryosparc version 3.3.1
I did not export CRYOSPARC_NO_PAGELOCK=true, is that a valid solution?
Also something in general seem to not have worked in that ab-inito since 4/5 classes have 0.0% and i just get 5 balls as volumes.

[CPU: 5.85 GB]     Done iteration 01421 of 04745 in 12.955s. Total time 18243.6s. Est time remaining 44098.6s.

[CPU: 5.98 GB]   ----------- Iteration  1422 (epoch 0.387).  radwn 67.91  resolution 12.00A  minisize  300  beta 0.00 

[CPU: 5.85 GB]      -- Class  0 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  1.000 S: 11.984 Class Size: 0.0% (Average: 20.8%)

[CPU: 5.85 GB]      -- Class  1 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio :   nan ESS R:  0.999 S: 11.988 Class Size: 100.0% (Average: 19.3%)

[CPU: 5.85 GB]      -- Class  2 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  0.999 S: 11.985 Class Size: 0.0% (Average: 20.7%)

[CPU: 5.85 GB]      -- Class  3 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  0.998 S: 11.982 Class Size: 0.0% (Average: 20.5%)

[CPU: 5.85 GB]      -- Class  4 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  1.002 S: 11.987 Class Size: 0.0% (Average: 18.6%)

[CPU: 5.86 GB]     Done iteration 01422 of 04745 in 12.987s. Total time 18256.5s. Est time remaining 43991.2s.

Error:

[CPU: 3.31 GB]   Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/abinit/run.py", line 276, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
  File "/home/cryosparcuser/cryosparc_worker/cryosparc_compute/noise_model.py", line 118, in get_noise_estimate
    assert n.all(n.isfinite(ret))
AssertionError