AssertionError of running Ab-Initio Reconstruction

Layman_XUE · April 1, 2021, 11:37am

Hi,
I found the below error when I ran Ab-Initio Reconstruction.
assert n.all(n.isfinite(M))
AssertionError

Screenshot from 2021-04-01 19-31-51|690x129

spunjani · April 1, 2021, 12:35pm

Hi @Layman_XUE, can you please let us know which version of cryoSPARC you are running, and please copy-paste the text of the traceback. Thank you!

Layman_XUE · April 1, 2021, 12:55pm

Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 84, in cryosparc_compute.run.main
File “cryosparc_worker/cryosparc_compute/jobs/abinit/run.py”, line 222, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/home/amax/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py”, line 453, in align_density
assert n.all(n.isfinite(M))
AssertionError

Layman_XUE · April 1, 2021, 12:56pm

Thank you! Traceback is attached.

Layman_XUE · April 1, 2021, 12:57pm

The version is v3.2 to run this job.

apunjani · April 1, 2021, 2:36pm

Hi @Layman_XUE,

Can you also report for us your OS version, NVIDIA driver version, and CUDA version?
We have reports of this issue sometimes on CentOS7 (which seems to have a lot of other problems with CUDA programs)

Layman_XUE · April 2, 2021, 1:20am

Thank you!
My OS is centos7,
NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 10.0

Navid · April 11, 2021, 5:02pm

I can report the same problem as well:

assert n.all(n.isfinite(M))
AssertionError

NathanaelCaveney · April 14, 2021, 5:11am

Also having this issue with v3.2 on CentOS7 w/ NVIDIA driver version 460.32.03 and CUDA version 11.2.

Thanks,
Nathanael

spunjani · April 15, 2021, 5:33pm

Dear all,
Is it possible to try running the job on a non-CentOS 7 machine?

NathanaelCaveney · April 15, 2021, 5:49pm

Sadly, I’m only running on CentOS7 here - if it helps, I reverted to 3.1.0 and the same job has run just fine.

olibclarke · April 15, 2021, 5:54pm

We are running CentOS 7 and do not see this error on v3.2 (original, haven’t yet applied latest patch), NVIDIA driver version 460.67 with CUDA 11.2

NathanaelCaveney · April 18, 2021, 5:32pm

As an update - I updated my NVIDIA driver to 460.73.01 and was still having the error. Tested removing the “cuMemHostAlloc failed” workaround from the worker config.sh file and this seems to have resolved the error - though the intermittent “cuMemHostAlloc failed” error persists in the absence of the workaround. @Layman_XUE @Navid

donaldb · April 19, 2021, 10:14am

I see the same. The Ab initio works fine when I remove the export CRYOSPARC_NO_PAGELOCK=true from my config.sh

It seems to be directly linked to that command. It may appear to be CentOS related, as only people running CentOS will add this command from reading the v3.2 changelog.

jr10 · June 18, 2021, 9:07pm

Hello, we are also seeing this error (specifically on a data set that was fixed following seeing the errors mentioned in this thread Error while 2D Classification. I removed the PAGELOCK setting from config.sh and the job seems to run now. We are also running v3.2 with the most recent patch on a Centos 7 machine.

Best,
Justas

olibclarke · June 19, 2021, 12:06am

We are also seeing this error now (after not seeing it previously) after adding the CRYOSPARC_NO_PAGELOCK=true line to config.sh

(unfortunately we then see the cuMemHostAlloc error again)

nfrasser · June 21, 2021, 2:43pm

@jr10 @olibclarke are you running the latest patch v3.2.0+210615? This should fix these errors in Ab-Initio. Please also also keep export CRYOSPARC_NO_PAGELOCK=true in cryosparc_worker/config.sh

KiSchnelle · March 23, 2022, 10:04am

I also get this error in Ab-initio.

Running on Ubuntu 20.04 LTS, Nvidia Driver version 510.54, cuda 11.5. Nvidia A40 GPU. Cryosparc version 3.3.1
I did not export CRYOSPARC_NO_PAGELOCK=true, is that a valid solution?
Also something in general seem to not have worked in that ab-inito since 4/5 classes have 0.0% and i just get 5 balls as volumes.

[CPU: 5.85 GB]     Done iteration 01421 of 04745 in 12.955s. Total time 18243.6s. Est time remaining 44098.6s.

[CPU: 5.98 GB]   ----------- Iteration  1422 (epoch 0.387).  radwn 67.91  resolution 12.00A  minisize  300  beta 0.00 

[CPU: 5.85 GB]      -- Class  0 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  1.000 S: 11.984 Class Size: 0.0% (Average: 20.8%)

[CPU: 5.85 GB]      -- Class  1 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio :   nan ESS R:  0.999 S: 11.988 Class Size: 100.0% (Average: 19.3%)

[CPU: 5.85 GB]      -- Class  2 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  0.999 S: 11.985 Class Size: 0.0% (Average: 20.7%)

[CPU: 5.85 GB]      -- Class  3 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  0.998 S: 11.982 Class Size: 0.0% (Average: 20.5%)

[CPU: 5.85 GB]      -- Class  4 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R:  1.002 S: 11.987 Class Size: 0.0% (Average: 18.6%)

[CPU: 5.86 GB]     Done iteration 01422 of 04745 in 12.987s. Total time 18256.5s. Est time remaining 43991.2s.

Error:

[CPU: 3.31 GB]   Traceback (most recent call last):
  File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main
  File "cryosparc_worker/cryosparc_compute/jobs/abinit/run.py", line 276, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
  File "/home/cryosparcuser/cryosparc_worker/cryosparc_compute/noise_model.py", line 118, in get_noise_estimate
    assert n.all(n.isfinite(ret))
AssertionError

midauden · November 21, 2023, 10:46am

Hi
I am getting the same error when running ab initios, though related to the worker, and the run stops early with this message:
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 230, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/localapps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py”, line 460, in align_density
assert n.all(n.isfinite(M))
AssertionError

The version I am using is v4.4.0, and I am running everything in a Ubuntu 20.04.6 LTS, processor Intel® Xeon(R) W-2275 CPU @ 3.30GHz × 28 with NVIDIA Corporation TU106 [GeForce RTX 2070] (TURBO RTX 2070) (2 GPU), CUDA version 11.8

Can you help me? Cheers!

wtempel · December 1, 2023, 9:14pm

Welcome to the forum @midauden.
We are unsure about the cause. It could be faulty RAM. You may want to try suggestions we linked in Out of bounds error, 4.2.1CS, 2D, 2080TI - #2 by wtempel.