I found the below error when I ran Ab-Initio Reconstruction.
Hi @Layman_XUE, can you please let us know which version of cryoSPARC you are running, and please copy-paste the text of the traceback. Thank you!
Traceback (most recent call last):
File “cryosparc_worker/cryosparc_compute/run.py”, line 84, in cryosparc_compute.run.main
File “cryosparc_worker/cryosparc_compute/jobs/abinit/run.py”, line 222, in cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/home/amax/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py”, line 453, in align_density
Thank you! Traceback is attached.
The version is v3.2 to run this job.
Can you also report for us your OS version, NVIDIA driver version, and CUDA version?
We have reports of this issue sometimes on CentOS7 (which seems to have a lot of other problems with CUDA programs)
My OS is centos7,
NVIDIA-SMI 450.57 Driver Version: 450.57 CUDA Version: 10.0
I can report the same problem as well:
assert n.all(n.isfinite(M)) AssertionError
Also having this issue with v3.2 on CentOS7 w/ NVIDIA driver version 460.32.03 and CUDA version 11.2.
Is it possible to try running the job on a non-CentOS 7 machine?
Sadly, I’m only running on CentOS7 here - if it helps, I reverted to 3.1.0 and the same job has run just fine.
We are running CentOS 7 and do not see this error on v3.2 (original, haven’t yet applied latest patch), NVIDIA driver version 460.67 with CUDA 11.2
As an update - I updated my NVIDIA driver to 460.73.01 and was still having the error. Tested removing the “cuMemHostAlloc failed” workaround from the worker config.sh file and this seems to have resolved the error - though the intermittent “cuMemHostAlloc failed” error persists in the absence of the workaround. @Layman_XUE @Navid
I see the same. The Ab initio works fine when I remove the
export CRYOSPARC_NO_PAGELOCK=true from my config.sh
It seems to be directly linked to that command. It may appear to be CentOS related, as only people running CentOS will add this command from reading the v3.2 changelog.
Hello, we are also seeing this error (specifically on a data set that was fixed following seeing the errors mentioned in this thread Error while 2D Classification. I removed the PAGELOCK setting from config.sh and the job seems to run now. We are also running v3.2 with the most recent patch on a Centos 7 machine.
We are also seeing this error now (after not seeing it previously) after adding the
CRYOSPARC_NO_PAGELOCK=true line to config.sh
(unfortunately we then see the
cuMemHostAlloc error again)
I also get this error in Ab-initio.
Running on Ubuntu 20.04 LTS, Nvidia Driver version 510.54, cuda 11.5. Nvidia A40 GPU. Cryosparc version 3.3.1
I did not export CRYOSPARC_NO_PAGELOCK=true, is that a valid solution?
Also something in general seem to not have worked in that ab-inito since 4/5 classes have 0.0% and i just get 5 balls as volumes.
[CPU: 5.85 GB] Done iteration 01421 of 04745 in 12.955s. Total time 18243.6s. Est time remaining 44098.6s. [CPU: 5.98 GB] ----------- Iteration 1422 (epoch 0.387). radwn 67.91 resolution 12.00A minisize 300 beta 0.00 [CPU: 5.85 GB] -- Class 0 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R: 1.000 S: 11.984 Class Size: 0.0% (Average: 20.8%) [CPU: 5.85 GB] -- Class 1 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : nan ESS R: 0.999 S: 11.988 Class Size: 100.0% (Average: 19.3%) [CPU: 5.85 GB] -- Class 2 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R: 0.999 S: 11.985 Class Size: 0.0% (Average: 20.7%) [CPU: 5.85 GB] -- Class 3 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R: 0.998 S: 11.982 Class Size: 0.0% (Average: 20.5%) [CPU: 5.85 GB] -- Class 4 -- lr: 0.20 eps: 67784185542048894737296639655936.00 step ratio : 0.0000 ESS R: 1.002 S: 11.987 Class Size: 0.0% (Average: 18.6%) [CPU: 5.86 GB] Done iteration 01422 of 04745 in 12.987s. Total time 18256.5s. Est time remaining 43991.2s.
[CPU: 3.31 GB] Traceback (most recent call last): File "cryosparc_worker/cryosparc_compute/run.py", line 85, in cryosparc_compute.run.main File "cryosparc_worker/cryosparc_compute/jobs/abinit/run.py", line 276, in cryosparc_compute.jobs.abinit.run.run_homo_abinit File "/home/cryosparcuser/cryosparc_worker/cryosparc_compute/noise_model.py", line 118, in get_noise_estimate assert n.all(n.isfinite(ret)) AssertionError
I am getting the same error when running ab initios, though related to the worker, and the run stops early with this message:
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 95, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 230, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/localapps/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py”, line 460, in align_density
The version I am using is v4.4.0, and I am running everything in a Ubuntu 20.04.6 LTS, processor Intel® Xeon(R) W-2275 CPU @ 3.30GHz × 28 with NVIDIA Corporation TU106 [GeForce RTX 2070] (TURBO RTX 2070) (2 GPU), CUDA version 11.8
Can you help me? Cheers!
Welcome to the forum @midauden.
We are unsure about the cause. It could be faulty RAM. You may want to try suggestions we linked in Out of bounds error, 4.2.1CS, 2D, 2080TI - #2 by wtempel.