Ab initio failure

ehanssen · June 11, 2024, 12:58am

HI all,
i am trying to run Ab initio jobs but somehow it crashes along the way. if i ask for 1 or 2 ab initio model no issue, it works. but 3 and above the job starts fine, create the right number of model sand they look Ok until a certian number of iteration where it starts to get worse then finaly fail with teh following error
[CPU: 1.84 GB Avail: 250.70 GB]
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 115, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 288, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/home/bio21em2/Software/Cryosparc/cryosparc_worker/cryosparc_compute/noise_model.py”, line 119, in get_noise_estimate
assert n.all(n.isfinite(ret))
AssertionError

we are running cryosparc 4.5.1

olibclarke · June 11, 2024, 1:45pm

I’ve run into this before, but only with non-standard parameters or unusual datasets. Is this happening with default params, on otherwise well-behaved datasets?

ehanssen · June 11, 2024, 9:09pm

Yes standard parameters but dataset is average, relatively thick ice as particle is 55nm , contains 1M urea, 200kV, so you can imagine signal to noise could be better ! ctf fit is not great, on average 6A.
The maps goes to 4.8A.
One of the things I don’t get is why when selecting different number of models the output is either fine or crashes

nameless_wonder · January 29, 2025, 12:33am

Hey CSers,
I fall on to the same problem. So, Ab-initio with 2 classes were fine. However, it ends up on the same error when running with 3 classes, which turns out to be essential for my dataset.

I am running v4.5.3, probably on the same cluster as @ehanssen.
Any leads on this please?

wtempel · February 3, 2025, 7:25pm

@nameless_wonder Did you observe the AssertionError for this dataset only, or also for other datasets?

nameless_wonder · February 3, 2025, 11:46pm

@wtempel Apologies, this is something I should have mentioned on the first place.
This is the only dataset I am working with at the moment; however, even with this dataset, Ab-initios with 3 classes went through successfully earlier, within the same workspace.

wtempel · February 4, 2025, 2:20pm

Thanks @nameless_wonder. Please can you

clone one of the successful 3-class ab initio jobs
run the cloned job in as similar as possible an environment as the old job
share with the developers the job reports of the old (successful) and new (presumably failed) job.

I will send you a direct message about a suitable email address.

nameless_wonder · February 13, 2025, 10:52pm

Dear @wtempel,

Surprisingly, the clone of the old successful job went through successfully. Then I cloned an old unsuccessful job, which fall into the same error.

I wanted to troubleshoot a bit, which took, me longer to respond (apologies). So, I cloned a successful job and linked particles from an unsuccessful one, and vice versa. However, now I am getting the following error:

“Traceback (most recent call last): File “cryosparc_master/cryosparc_compute/run.py”, line 115, in cryosparc_master.cryosparc_compute.run.main File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 480, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit File “/apps/cryosparc/cryosparc-general/4.5.3/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 1002, in output_single_volume dset = create_single_volume_ds(map_r, psize, name, rel_path_no_ext, symop=symop, write_volume=write_volume, rel_path_mrc=rel_path_mrc) File “/apps/cryosparc/cryosparc-general/4.5.3/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 979, in create_single_volume_ds mrc.write_mrc(os.path.join(get_project_dir_abs(), rel_path_mrc), map_r, psize) File “/apps/cryosparc/cryosparc-general/4.5.3/cryosparc_worker/cryosparc_compute/blobio/mrc.py”, line 252, in write_mrc cryosparc_io.write_mrc( RuntimeError: couldn’t write to /cryosparc/co55/forhad/NegeVirus/CS-nege-virus/J254/J254_class_00_00000_volume.mrc”

Anyway, I have emailed the job reports (job.log files) of both the successful and unsuccessful cloned jobs. Please note, I have been running with variable class similarity (0 to 0.5) and I have always kept “Cache particle images to SSD” false. These parameters were applied to both the successful and unsuccessful jobs.

Successful job: successful_job.log - Google Drive

Unsuccessful job: unsuccessful_job.log - Google Drive

Current error:

wtempel · February 18, 2025, 4:02pm

@nameless_wonder Thanks for posting this info. Please can you post the outputs of these commands

cryosparcm cli "get_job('P46', 'J242', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run', 'input_slot_groups', 'started_at')"
cryosparcm cli "get_job('P46', 'J252', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run', 'input_slot_groups', 'started_at')"
cryosparcm joblog P46 J254 | tail -n 40

nameless_wonder · March 13, 2025, 6:13am

Dear @wtempel,

Apologies for the delay as I had to liaise with our cluster managers and also doing a bit more troubleshooting.

Here are the outputs:

$ cryosparcm cli “get_job(‘P46’, ‘J242’, ‘job_type’, ‘version’, ‘instance_information’, ‘status’, ‘params_spec’, ‘errors_run’, ‘input_slot_groups’, ‘started_at’)”
{‘_id’: ‘67a550eb0d1cd394b979b233’, ‘errors_run’: , ‘input_slot_groups’: [{‘connections’: [{‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J31’, ‘slots’: [{‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J31’, ‘result_name’: ‘blob’, ‘result_type’: ‘particle.blob’, ‘slot_name’: ‘blob’, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J31’, ‘result_name’: ‘ctf’, ‘result_type’: ‘particle.ctf’, ‘slot_name’: ‘ctf’, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J31’, ‘result_name’: ‘alignments2D’, ‘result_type’: ‘particle.alignments2D’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J31’, ‘result_name’: ‘pick_stats’, ‘result_type’: ‘particle.pick_stats’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J31’, ‘result_name’: ‘location’, ‘result_type’: ‘particle.location’, ‘slot_name’: None, ‘version’: ‘F’}]}], ‘count_max’: inf, ‘count_min’: 1, ‘description’: ‘Particle stacks to use. Multiple stacks will be concatenated.’, ‘name’: ‘particles’, ‘repeat_allowed’: False, ‘slots’: [{‘description’: ‘’, ‘name’: ‘blob’, ‘optional’: False, ‘title’: ‘Particle data blobs’, ‘type’: ‘particle.blob’}, {‘description’: ‘’, ‘name’: ‘ctf’, ‘optional’: False, ‘title’: ‘Particle ctf parameters’, ‘type’: ‘particle.ctf’}, {‘description’: ‘’, ‘name’: ‘alignments3D’, ‘optional’: True, ‘title’: ‘Computed alignments (optional – only used to passthrough half set splits.)’, ‘type’: ‘particle.alignments3D’}, {‘description’: ‘’, ‘name’: ‘filament’, ‘optional’: True, ‘title’: ‘Particle filament info’, ‘type’: ‘particle.filament’}], ‘title’: ‘Particle stacks’, ‘type’: ‘particle’}], ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘950.83GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz’, ‘driver_version’: ‘12.2’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 15655829504, ‘name’: ‘Tesla T4’, ‘pcie’: ‘0000:12:00’}], ‘ofd_hard_limit’: 524288, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 52, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘m3t101’, ‘platform_release’: ‘5.14.0-284.25.1.el9_2.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Wed Aug 2 14:53:30 UTC 2023’, ‘total_memory’: ‘1006.60GB’, ‘used_memory’: ‘50.98GB’}, ‘job_type’: ‘homo_abinit’, ‘params_spec’: {‘abinit_K’: {‘value’: 3}, ‘abinit_class_anneal_beta’: {‘value’: 0.5}, ‘compute_use_ssd’: {‘value’: False}}, ‘project_uid’: ‘P46’, ‘started_at’: ‘Fri, 07 Feb 2025 00:16:59 GMT’, ‘status’: ‘completed’, ‘uid’: ‘J242’, ‘version’: ‘v4.5.3’}

$ cryosparcm cli “get_job(‘P46’, ‘J252’, ‘job_type’, ‘version’, ‘instance_information’, ‘status’, ‘params_spec’, ‘errors_run’, ‘input_slot_groups’, ‘started_at’)”
{‘_id’: ‘67a57c4b0d1cd394b97e36b2’, ‘errors_run’: [{‘message’: ‘’, ‘warning’: False}], ‘input_slot_groups’: [{‘connections’: [{‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J132’, ‘slots’: [{‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J132’, ‘result_name’: ‘blob’, ‘result_type’: ‘particle.blob’, ‘slot_name’: ‘blob’, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J132’, ‘result_name’: ‘ctf’, ‘result_type’: ‘particle.ctf’, ‘slot_name’: ‘ctf’, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J132’, ‘result_name’: ‘alignments2D’, ‘result_type’: ‘particle.alignments2D’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J132’, ‘result_name’: ‘pick_stats’, ‘result_type’: ‘particle.pick_stats’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J132’, ‘result_name’: ‘location’, ‘result_type’: ‘particle.location’, ‘slot_name’: None, ‘version’: ‘F’}]}], ‘count_max’: inf, ‘count_min’: 1, ‘description’: ‘Particle stacks to use. Multiple stacks will be concatenated.’, ‘name’: ‘particles’, ‘repeat_allowed’: False, ‘slots’: [{‘description’: ‘’, ‘name’: ‘blob’, ‘optional’: False, ‘title’: ‘Particle data blobs’, ‘type’: ‘particle.blob’}, {‘description’: ‘’, ‘name’: ‘ctf’, ‘optional’: False, ‘title’: ‘Particle ctf parameters’, ‘type’: ‘particle.ctf’}, {‘description’: ‘’, ‘name’: ‘alignments3D’, ‘optional’: True, ‘title’: ‘Computed alignments (optional – only used to passthrough half set splits.)’, ‘type’: ‘particle.alignments3D’}, {‘description’: ‘’, ‘name’: ‘filament’, ‘optional’: True, ‘title’: ‘Particle filament info’, ‘type’: ‘particle.filament’}], ‘title’: ‘Particle stacks’, ‘type’: ‘particle’}], ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘953.83GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz’, ‘driver_version’: ‘12.2’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 15655829504, ‘name’: ‘Tesla T4’, ‘pcie’: ‘0000:12:00’}], ‘ofd_hard_limit’: 524288, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 52, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘m3t101’, ‘platform_release’: ‘5.14.0-284.25.1.el9_2.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Wed Aug 2 14:53:30 UTC 2023’, ‘total_memory’: ‘1006.60GB’, ‘used_memory’: ‘48.20GB’}, ‘job_type’: ‘homo_abinit’, ‘params_spec’: {‘abinit_K’: {‘value’: 3}, ‘abinit_class_anneal_beta’: {‘value’: 0}, ‘compute_use_ssd’: {‘value’: False}}, ‘project_uid’: ‘P46’, ‘started_at’: ‘Fri, 07 Feb 2025 03:22:02 GMT’, ‘status’: ‘failed’, ‘uid’: ‘J252’, ‘version’: ‘v4.5.3’}

$ cryosparcm joblog P46 J254 | tail -n 40
========= sending heartbeat at 2025-02-14 11:47:55.229980
========= sending heartbeat at 2025-02-14 11:48:05.243461
========= sending heartbeat at 2025-02-14 11:48:15.256215
========= sending heartbeat at 2025-02-14 11:48:25.262897
========= sending heartbeat at 2025-02-14 11:48:35.277559
========= sending heartbeat at 2025-02-14 11:48:45.292215
========= sending heartbeat at 2025-02-14 11:48:55.305206
========= sending heartbeat at 2025-02-14 11:49:05.318924
========= sending heartbeat at 2025-02-14 11:49:15.332998
========= sending heartbeat at 2025-02-14 11:49:25.347194
========= sending heartbeat at 2025-02-14 11:49:35.360204
========= sending heartbeat at 2025-02-14 11:49:45.373212
========= sending heartbeat at 2025-02-14 11:49:55.386790
========= sending heartbeat at 2025-02-14 11:50:05.402358
========= sending heartbeat at 2025-02-14 11:50:15.407858
========= sending heartbeat at 2025-02-14 11:50:25.421297
========= sending heartbeat at 2025-02-14 11:50:35.434206
========= sending heartbeat at 2025-02-14 11:50:45.448348
========= sending heartbeat at 2025-02-14 11:50:55.462205
========= sending heartbeat at 2025-02-14 11:51:05.475556
========= sending heartbeat at 2025-02-14 11:51:15.489803
========= sending heartbeat at 2025-02-14 11:51:25.503334
========= sending heartbeat at 2025-02-14 11:51:35.516202
========= sending heartbeat at 2025-02-14 11:51:45.529204
========= sending heartbeat at 2025-02-14 11:51:55.543954
========= sending heartbeat at 2025-02-14 11:52:05.559207
/apps/cryosparc/cryosparc-general/4.5.3/cryosparc_worker/cryosparc_compute/util/logsumexp.py:41: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
:1: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
========= sending heartbeat at 2025-02-14 11:52:15.573200
========= sending heartbeat at 2025-02-14 11:52:25.586943
2025-02-14 11:52:30,358 del INFO | Deleting plot real-slice-000
2025-02-14 11:52:30,386 del INFO | Deleting plot viewing_dist-000
2025-02-14 11:52:30,400 del INFO | Deleting plot real-slice-001
2025-02-14 11:52:30,426 del INFO | Deleting plot viewing_dist-001
2025-02-14 11:52:30,439 del INFO | Deleting plot real-slice-002
2025-02-14 11:52:30,464 del INFO | Deleting plot viewing_dist-002
2025-02-14 11:52:30,477 del INFO | Deleting plot noise_model

Besides, I was looking into exactly where the issue occurred. It turns out at some point I started having this error with ab-initio when I was tryiing to get rid of junk ptcls through 2D classification. So, as mentioned earlier, to replicate the scenario, I cloned an “unsuccessful” job and linked “successful” ptcls, and the ab-initio succeeded. However, when I ran those “successful” ptcls through 2 more rounds of 2D classifications, and did ab-initio again, it failed. Happy to post any details of these jobs as required. Please advise.

nameless_wonder · March 13, 2025, 7:11am

To add a bit more to it, I was monitoring an ab-initio job, from 600 iteration, they started looking like this:

wtempel · March 13, 2025, 3:01pm

@nameless_wonder Did this error occur in the same project P46, or another project? I am asking because the RuntimeError seems to be inconsistent with the output from

Has the job been re-run after the RuntimeError was observed?

nameless_wonder · March 17, 2025, 3:14am

Hi @wtempel

Yeah the job re-ran later, and yes, was in the same project P46. Apologies that I didn’t clarify on the first place.

nameless_wonder · April 6, 2025, 11:22am

Hi @wtempel

Hope you are doing well. Just to update on this, recently CS was updated on our cluster. Then I tried to run failed jobs again, but still no luck. Any insights into this would be great.

Thanks for your support.

wtempel · April 7, 2025, 2:05pm

Please can you post for one of the newly failed job:

outputs of these commands (replacing P99 and J999 with the relevant project and job IDs):

csprojectid="P99"
csjobid="J999"
cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run', 'input_slot_groups', 'started_at')"
cryosparcm joblog $csprojectid $csjobid | tail -n 40
cryosparcm eventlog $csprojectid $csjobid | tail -n 40

any plots indicating the problem (if applicable)

nameless_wonder · April 8, 2025, 5:54am

Hi @wtempel

I have requested our cluster manager to run the commands. I will post asap.
This is the noise model plot

Screenshot 2025-04-08 at 3.53.30 PM1204×339 93 KB

Thank you.

nameless_wonder · April 10, 2025, 3:02am

@wtempel
Hope you are doing well.

[cryosparc-general@m3t101 ~]$ cryosparcm cli “get_job(‘$csprojectid’, ‘$csjobid’, ‘job_type’, ‘version’, ‘instance_information’, ‘status’, ‘params_spec’, ‘errors_run’, ‘input_slot_groups’, ‘started_at’)”
{‘_id’: ‘67f25327ae73e608d7eabb7b’, ‘errors_run’: [{‘message’: ‘’, ‘warning’: False}], ‘input_slot_groups’: [{‘connections’: [{‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J189’, ‘slots’: [{‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J189’, ‘result_name’: ‘blob’, ‘result_type’: ‘particle.blob’, ‘slot_name’: ‘blob’, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J189’, ‘result_name’: ‘ctf’, ‘result_type’: ‘particle.ctf’, ‘slot_name’: ‘ctf’, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J189’, ‘result_name’: ‘alignments3D’, ‘result_type’: ‘particle.alignments3D’, ‘slot_name’: ‘alignments3D’, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J189’, ‘result_name’: ‘alignments2D’, ‘result_type’: ‘particle.alignments2D’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J189’, ‘result_name’: ‘pick_stats’, ‘result_type’: ‘particle.pick_stats’, ‘slot_name’: None, ‘version’: ‘F’}, {‘group_name’: ‘particles_selected’, ‘job_uid’: ‘J189’, ‘result_name’: ‘location’, ‘result_type’: ‘particle.location’, ‘slot_name’: None, ‘version’: ‘F’}]}], ‘count_max’: inf, ‘count_min’: 1, ‘description’: ‘Particle stacks to use. Multiple stacks will be concatenated.’, ‘name’: ‘particles’, ‘repeat_allowed’: False, ‘slots’: [{‘description’: ‘’, ‘name’: ‘blob’, ‘optional’: False, ‘title’: ‘Particle data blobs’, ‘type’: ‘particle.blob’}, {‘description’: ‘’, ‘name’: ‘ctf’, ‘optional’: False, ‘title’: ‘Particle ctf parameters’, ‘type’: ‘particle.ctf’}, {‘description’: ‘’, ‘name’: ‘alignments3D’, ‘optional’: True, ‘title’: ‘Computed alignments (optional – only used to passthrough half set splits.)’, ‘type’: ‘particle.alignments3D’}, {‘description’: ‘’, ‘name’: ‘filament’, ‘optional’: True, ‘title’: ‘Particle filament info’, ‘type’: ‘particle.filament’}], ‘title’: ‘Particle stacks’, ‘type’: ‘particle’}], ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘932.16GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz’, ‘driver_version’: ‘12.2’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 15655829504, ‘name’: ‘Tesla T4’, ‘pcie’: ‘0000:48:00’}], ‘ofd_hard_limit’: 524288, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 52, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘m3t101’, ‘platform_release’: ‘5.14.0-427.22.1.el9_4.x86_64’, ‘platform_version’: ‘#1 SMP PREEMPT_DYNAMIC Wed Jun 19 17:35:04 UTC 2024’, ‘total_memory’: ‘1006.59GB’, ‘used_memory’: ‘65.93GB’}, ‘job_type’: ‘homo_abinit’, ‘params_spec’: {‘abinit_K’: {‘value’: 3}, ‘compute_use_ssd’: {‘value’: False}}, ‘project_uid’: ‘P46’, ‘started_at’: ‘Sun, 06 Apr 2025 10:11:14 GMT’, ‘status’: ‘failed’, ‘uid’: ‘J344’, ‘version’: ‘v4.6.2’}

[cryosparc-general@m3t101 ~]$ cryosparcm joblog $csprojectid $csjobid | tail -n 40
========= sending heartbeat at 2025-04-06 21:07:20.269889
========= sending heartbeat at 2025-04-06 21:07:30.285765
========= sending heartbeat at 2025-04-06 21:07:40.301827
========= sending heartbeat at 2025-04-06 21:07:50.316832
========= sending heartbeat at 2025-04-06 21:08:00.331982
========= sending heartbeat at 2025-04-06 21:08:10.347020
========= sending heartbeat at 2025-04-06 21:08:20.362817
========= sending heartbeat at 2025-04-06 21:08:30.377827
========= sending heartbeat at 2025-04-06 21:08:40.392818
/apps/cryosparc/cryosparc-general/4.6.2/cryosparc_worker/cryosparc_compute/util/logsumexp.py:41: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
:1: UserWarning: Cannot manually free CUDA array; will be freed when garbage collected
:1: RuntimeWarning: divide by zero encountered in double_scalars
========= sending heartbeat at 2025-04-06 21:08:50.407817
========= sending heartbeat at 2025-04-06 21:09:00.422820
========= sending heartbeat at 2025-04-06 21:09:10.438158
========= sending heartbeat at 2025-04-06 21:09:20.452265
========= sending heartbeat at 2025-04-06 21:09:30.467341
========= sending heartbeat at 2025-04-06 21:09:40.481819
========= sending heartbeat at 2025-04-06 21:09:50.536811
:1: RuntimeWarning: invalid value encountered in float_scalars
========= sending heartbeat at 2025-04-06 21:10:00.551816
========= sending heartbeat at 2025-04-06 21:10:10.565815
/apps/cryosparc/cryosparc-general/4.6.2/cryosparc_worker/cryosparc_compute/sigproc.py:512: RuntimeWarning: invalid value encountered in multiply
n.bincount(ir+1, weights=f*M, minlength=maxRadius)
2025-04-06 21:10:16,501 del INFO | Deleting plot real-slice-000
2025-04-06 21:10:16,532 del INFO | Deleting plot viewing_dist-000
2025-04-06 21:10:16,546 del INFO | Deleting plot real-slice-001
2025-04-06 21:10:16,572 del INFO | Deleting plot viewing_dist-001
2025-04-06 21:10:16,586 del INFO | Deleting plot real-slice-002
2025-04-06 21:10:16,611 del INFO | Deleting plot viewing_dist-002
2025-04-06 21:10:16,624 del INFO | Deleting plot noise_model
**** handle exception rc
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 129, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 302, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/apps/cryosparc/cryosparc-general/4.6.2/cryosparc_worker/cryosparc_compute/noise_model.py”, line 119, in get_noise_estimate
assert n.all(n.isfinite(ret))
AssertionError
set status to failed

[cryosparc-general@m3t101 ~]$ cryosparcm eventlog $csprojectid $csjobid | tail -n 40
[Sun, 06 Apr 2025 11:09:58 GMT] [CPU RAM used: 1548 MB] – Class 0 – lr: 0.20 eps: 3578971451017729897490227920896.00 step ratio : 0.0000 ESS R: 0.999 S: 11.985 Class Size: 0.0% (Average: 41.1%)
[Sun, 06 Apr 2025 11:09:58 GMT] [CPU RAM used: 1548 MB] – Class 1 – lr: 0.20 eps: 3578971451017729897490227920896.00 step ratio : nan ESS R: 0.999 S: 11.985 Class Size: 100.6% (Average: 36.4%)
[Sun, 06 Apr 2025 11:09:58 GMT] [CPU RAM used: 1548 MB] – Class 2 – lr: 0.20 eps: 3578971451017729897490227920896.00 step ratio : 0.0000 ESS R: 0.999 S: 11.984 Class Size: 0.0% (Average: 22.5%)
[Sun, 06 Apr 2025 11:09:58 GMT] [CPU RAM used: 1548 MB] Done iteration 00926 of 01732 in 2.912s. Total time 3475.9s. Est time remaining 2384.7s.
[Sun, 06 Apr 2025 11:09:59 GMT] [CPU RAM used: 1579 MB] ----------- Iteration 927 (epoch 92.012). radwn 59.07 resolution 17.77A minisize 300 beta 0.00
[Sun, 06 Apr 2025 11:10:01 GMT] [CPU RAM used: 1532 MB] – Class 0 – lr: 0.20 eps: 4981105816757163441885225680896.00 step ratio : nan ESS R: 0.999 S: 11.989 Class Size: 100.3% (Average: 41.2%)
[Sun, 06 Apr 2025 11:10:01 GMT] [CPU RAM used: 1532 MB] – Class 1 – lr: 0.20 eps: 4981105816757163441885225680896.00 step ratio : 0.0000 ESS R: 0.999 S: 11.989 Class Size: 0.0% (Average: 36.3%)
[Sun, 06 Apr 2025 11:10:01 GMT] [CPU RAM used: 1532 MB] – Class 2 – lr: 0.20 eps: 4981105816757163441885225680896.00 step ratio : 0.0000 ESS R: 0.999 S: 11.988 Class Size: 0.0% (Average: 22.5%)
[Sun, 06 Apr 2025 11:10:01 GMT] [CPU RAM used: 1532 MB] Done iteration 00927 of 01732 in 2.947s. Total time 3478.8s. Est time remaining 2380.5s.
[Sun, 06 Apr 2025 11:10:01 GMT] [CPU RAM used: 1532 MB] ----------- Iteration 928 (epoch 92.140). radwn 59.11 resolution 17.76A minisize 300 beta 0.00
[Sun, 06 Apr 2025 11:10:04 GMT] [CPU RAM used: 1517 MB] – Class 0 – lr: 0.20 eps: 6480451278718659727788479610880.00 step ratio : 0.0000 ESS R: 0.999 S: 11.993 Class Size: 0.0% (Average: 41.2%)
[Sun, 06 Apr 2025 11:10:04 GMT] [CPU RAM used: 1517 MB] – Class 1 – lr: 0.20 eps: 6480451278718659727788479610880.00 step ratio : 0.0000 ESS R: 0.999 S: 11.983 Class Size: 0.0% (Average: 36.2%)
[Sun, 06 Apr 2025 11:10:04 GMT] [CPU RAM used: 1517 MB] – Class 2 – lr: 0.20 eps: 6480451278718659727788479610880.00 step ratio : nan ESS R: 0.999 S: 11.989 Class Size: 99.6% (Average: 22.6%)
[Sun, 06 Apr 2025 11:10:04 GMT] [CPU RAM used: 1517 MB] Done iteration 00928 of 01732 in 2.915s. Total time 3481.8s. Est time remaining 2373.8s.
[Sun, 06 Apr 2025 11:10:04 GMT] [CPU RAM used: 1548 MB] ----------- Iteration 929 (epoch 92.268). radwn 59.15 resolution 17.75A minisize 300 beta 0.00
[Sun, 06 Apr 2025 11:10:07 GMT] [CPU RAM used: 1548 MB] – Class 0 – lr: 0.20 eps: 8276119252331889957344987054080.00 step ratio : nan ESS R: 0.999 S: 11.987 Class Size: 100.6% (Average: 41.3%)
[Sun, 06 Apr 2025 11:10:07 GMT] [CPU RAM used: 1548 MB] – Class 1 – lr: 0.20 eps: 8276119252331889957344987054080.00 step ratio : 0.0000 ESS R: 1.000 S: 11.985 Class Size: 0.0% (Average: 36.2%)
[Sun, 06 Apr 2025 11:10:07 GMT] [CPU RAM used: 1548 MB] – Class 2 – lr: 0.20 eps: 8276119252331889957344987054080.00 step ratio : 0.0000 ESS R: 0.999 S: 11.987 Class Size: 0.0% (Average: 22.6%)
[Sun, 06 Apr 2025 11:10:07 GMT] [CPU RAM used: 1548 MB] Done iteration 00929 of 01732 in 2.893s. Total time 3484.7s. Est time remaining 2365.8s.
[Sun, 06 Apr 2025 11:10:07 GMT] [CPU RAM used: 1548 MB] ----------- Iteration 930 (epoch 92.396). radwn 59.19 resolution 17.73A minisize 300 beta 0.00
[Sun, 06 Apr 2025 11:10:09 GMT] [CPU RAM used: 1470 MB] – Class 0 – lr: 0.20 eps: 11299905080013315804792258297856.00 step ratio : 0.0000 ESS R: 1.003 S: 11.988 Class Size: 0.0% (Average: 41.2%)
[Sun, 06 Apr 2025 11:10:10 GMT] [CPU RAM used: 1470 MB] – Class 1 – lr: 0.20 eps: 11299905080013315804792258297856.00 step ratio : nan ESS R: 0.999 S: 11.985 Class Size: 100.4% (Average: 36.3%)
[Sun, 06 Apr 2025 11:10:10 GMT] [CPU RAM used: 1470 MB] – Class 2 – lr: 0.20 eps: 11299905080013315804792258297856.00 step ratio : 0.0000 ESS R: 0.999 S: 11.988 Class Size: 0.0% (Average: 22.5%)
[Sun, 06 Apr 2025 11:10:10 GMT] [CPU RAM used: 1470 MB] Done iteration 00930 of 01732 in 2.893s. Total time 3487.6s. Est time remaining 2358.3s.
[Sun, 06 Apr 2025 11:10:10 GMT] [CPU RAM used: 1657 MB] ----------- Iteration 931 (epoch 92.525). radwn 59.23 resolution 17.72A minisize 300 beta 0.00
[Sun, 06 Apr 2025 11:10:12 GMT] [CPU RAM used: 1579 MB] – Class 0 – lr: 0.20 eps: 15041463745754175659931944878080.00 step ratio : 0.0000 ESS R: 0.999 S: 11.984 Class Size: 0.0% (Average: 41.1%)
[Sun, 06 Apr 2025 11:10:13 GMT] [CPU RAM used: 1579 MB] – Class 1 – lr: 0.20 eps: 15041463745754175659931944878080.00 step ratio : 0.0000 ESS R: 0.999 S: 11.988 Class Size: 0.0% (Average: 36.2%)
[Sun, 06 Apr 2025 11:10:13 GMT] [CPU RAM used: 1579 MB] – Class 2 – lr: 0.20 eps: 15041463745754175659931944878080.00 step ratio : nan ESS R: 0.999 S: 11.988 Class Size: 100.5% (Average: 22.7%)
[Sun, 06 Apr 2025 11:10:13 GMT] [CPU RAM used: 1579 MB] Done iteration 00931 of 01732 in 2.911s. Total time 3490.5s. Est time remaining 2352.6s.
[Sun, 06 Apr 2025 11:10:13 GMT] [CPU RAM used: 1579 MB] ----------- Iteration 932 (epoch 92.653). radwn 59.27 resolution 17.71A minisize 300 beta 0.00
[Sun, 06 Apr 2025 11:10:15 GMT] [CPU RAM used: 1532 MB] – Class 0 – lr: 0.20 eps: 18711158894249892297634310258688.00 step ratio : nan ESS R: 1.002 S: 11.986 Class Size: 88.5% (Average: 41.2%)
[Sun, 06 Apr 2025 11:10:15 GMT] [CPU RAM used: 1532 MB] – Class 1 – lr: 0.20 eps: 18711158894249892297634310258688.00 step ratio : nan ESS R: 0.999 S: 11.989 Class Size: 12.1% (Average: 36.2%)
[Sun, 06 Apr 2025 11:10:16 GMT] [CPU RAM used: 1532 MB] – Class 2 – lr: 0.20 eps: 18711158894249892297634310258688.00 step ratio : 0.0000 ESS R: 1.002 S: 11.992 Class Size: 0.0% (Average: 22.6%)
[Sun, 06 Apr 2025 11:10:16 GMT] [CPU RAM used: 1532 MB] Done iteration 00932 of 01732 in 2.885s. Total time 3493.4s. Est time remaining 2345.2s.
[Sun, 06 Apr 2025 11:10:16 GMT] [CPU RAM used: 1360 MB] Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 129, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/abinit/run.py”, line 302, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
File “/apps/cryosparc/cryosparc-general/4.6.2/cryosparc_worker/cryosparc_compute/noise_model.py”, line 119, in get_noise_estimate
assert n.all(n.isfinite(ret))
AssertionError

vperetroukhin · April 14, 2025, 4:28pm

Hi @nameless_wonder –

We discussed this internally and have a few suggestions for parameter modifications that may stop this from happening.

Set Noise model (white, symmetric or coloured) to white (default is symmetric).
Set both Enforce non-negativity and Center structures in real space to False.

Let us know if re-launching the job with 1) or 2) results in the same error.

Thanks!
Valentin

nameless_wonder · May 8, 2025, 7:04am

Hi @vperetroukhin

Apologies for the delay.

Yes, the job went through, also I manage to run them from a specific point upstream.

However, classes with these parameters (one at a time) looked weird.