Particle curation failing

vincent · October 31, 2023, 10:16am

Hi,

strange error in particle curation job. The job is unstable, it ran well on a small subset (100k particles) but for the large dataset (6M particles) it sends an error message after NCC and power estimation about interactive display of images, similar to what was described here: Interactive exposure curation failing

However, I don’t have the FileNotFound error described in the previous post, instead just browsing out of the interactive tab and back in resolved the issue and allowed me to change the thresholds. I thought that was it, but when I click “Done output particles”, the job starts but ends abnormally and doesn’t output particles.

Can you help me solve this issue?
Thanks
Vincent

(cryosparc 4.2.1)
The end of the log is here:

[CPU:   2.06 GB  Avail:   3.53 GB]
==== Completed. Extracted 350089 particles.

[CPU:   2.06 GB  Avail:   3.53 GB]
Interactive backend shutting down.

[CPU:   2.06 GB  Avail:   3.53 GB]
--------------------------------------------------------------

[CPU:   2.06 GB  Avail:   3.53 GB]
Compiling job outputs...

[CPU:   2.06 GB  Avail:   3.53 GB]
Passing through outputs for output group micrographs from input group micrographs

[CPU:   2.06 GB  Avail:   3.53 GB]
This job outputted results ['micrograph_blob']

[CPU:   2.06 GB  Avail:   3.53 GB]
  Loaded output dset with 2460 items

[CPU:   2.06 GB  Avail:   3.53 GB]
Passthrough results ['ctf', 'mscope_params', 'background_blob', 'micrograph_thumbnail_blob_1x', 'micrograph_thumbnail_blob_2x', 'movie_blob', 'ctf_stats', 'rigid_motion', 'spline_motion', 'micrograph_blob_non_dw', 'gain_ref_blob']

[CPU:   2.06 GB  Avail:   3.53 GB]
  Loaded passthrough dset with 2460 items

[CPU:   2.06 GB  Avail:   3.53 GB]
  Intersection of output and passthrough has 2460 items

[CPU:   2.06 GB  Avail:   3.53 GB]
Passing through outputs for output group particles from input group particles

[CPU:   2.06 GB  Avail:   3.53 GB]
This job outputted results ['location']

[CPU:   2.06 GB  Avail:   3.53 GB]
  Loaded output dset with 350089 items

[CPU:   2.06 GB  Avail:   3.53 GB]
Passthrough results ['pick_stats', 'ctf']

[CPU:  122.6 MB  Avail:   5.59 GB]
====== Job process terminated abnormally.

wtempel · October 31, 2023, 3:18pm

How much RAM does your CryoSPARC master computer have?
Are there additional error messages in the job log (under Metadata|Log)?
Do system logs indicate that the job was terminated by the kernel’s out-of-memory manager?

vincent · October 31, 2023, 3:40pm

It’s a big cluster, I’ll inquire about the RAM but I don’t think it’s the problem.

No error message in the log either, here it is:

===========================================================================
========= monitor process now starting main process at 2023-10-31 10:43:13.571409
MAINPROCESS PID 950692
========= monitor process now waiting for main process
MAIN PID 950692
interactive.run_inspect_picks_v2 cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2023-10-31 10:43:47.105117

INTERACTIVE JOB STARTED === 2023-10-31 10:43:50.777616 ==========================
========= sending heartbeat at 2023-10-31 10:43:57.113093
========= sending heartbeat at 2023-10-31 10:44:07.133735
========= sending heartbeat at 2023-10-31 10:44:17.151222
========= sending heartbeat at 2023-10-31 10:44:27.179682
========= sending heartbeat at 2023-10-31 10:44:37.200386
========= sending heartbeat at 2023-10-31 10:44:47.223504
========= sending heartbeat at 2023-10-31 10:44:57.242058
========= sending heartbeat at 2023-10-31 10:45:07.260964
========= sending heartbeat at 2023-10-31 10:45:17.276600
========= sending heartbeat at 2023-10-31 10:45:27.292307
========= sending heartbeat at 2023-10-31 10:45:37.308744
========= sending heartbeat at 2023-10-31 10:45:47.326507
========= sending heartbeat at 2023-10-31 10:45:57.346060
========= sending heartbeat at 2023-10-31 10:46:07.367602
========= sending heartbeat at 2023-10-31 10:46:17.393509
========= sending heartbeat at 2023-10-31 10:46:27.412028
========= sending heartbeat at 2023-10-31 10:46:37.430234
========= sending heartbeat at 2023-10-31 10:46:47.442748
========= sending heartbeat at 2023-10-31 10:46:57.465992
========= sending heartbeat at 2023-10-31 10:47:07.490672
========= sending heartbeat at 2023-10-31 10:47:17.509236

Serving Flask app “inspect_picks_v2” (lazy loading)
Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
Debug mode: off
[EXTERN got get_micrograph_data 2023-10-31 09:47:24.843079 ]
========= sending heartbeat at 2023-10-31 10:47:27.533420
[EXTERN done get_micrograph_data 2023-10-31 09:47:33.682816 8.84s ]
[EXTERN got get_picks 2023-10-31 09:47:33.685828 ]
[EXTERN done get_picks 2023-10-31 09:47:33.686294 0.00s ]
========= sending heartbeat at 2023-10-31 10:47:37.549485
========= sending heartbeat at 2023-10-31 10:47:47.567151
========= sending heartbeat at 2023-10-31 10:47:57.584733
========= sending heartbeat at 2023-10-31 10:48:07.602846
========= sending heartbeat at 2023-10-31 10:48:17.621825
========= sending heartbeat at 2023-10-31 10:48:27.641267
========= sending heartbeat at 2023-10-31 10:48:37.655205
========= sending heartbeat at 2023-10-31 10:48:47.672979
========= sending heartbeat at 2023-10-31 10:48:57.691490
========= sending heartbeat at 2023-10-31 10:49:07.709914
========= sending heartbeat at 2023-10-31 10:49:17.727661
========= sending heartbeat at 2023-10-31 10:49:27.747132
========= sending heartbeat at 2023-10-31 10:49:37.765275
========= sending heartbeat at 2023-10-31 10:49:47.783329
========= sending heartbeat at 2023-10-31 10:49:57.802193
========= sending heartbeat at 2023-10-31 10:50:07.820410
========= sending heartbeat at 2023-10-31 10:50:17.839483
========= sending heartbeat at 2023-10-31 10:50:27.858189
========= sending heartbeat at 2023-10-31 10:50:37.876091
========= sending heartbeat at 2023-10-31 10:50:47.893502
========= sending heartbeat at 2023-10-31 10:50:57.912382
========= sending heartbeat at 2023-10-31 10:51:07.932951
========= sending heartbeat at 2023-10-31 10:51:17.951356
========= sending heartbeat at 2023-10-31 10:51:27.960534
========= sending heartbeat at 2023-10-31 10:51:37.977748
========= sending heartbeat at 2023-10-31 10:51:47.995968
========= sending heartbeat at 2023-10-31 10:51:58.015207
========= sending heartbeat at 2023-10-31 10:52:08.025478
========= sending heartbeat at 2023-10-31 10:52:18.043771
========= sending heartbeat at 2023-10-31 10:52:28.062699
========= sending heartbeat at 2023-10-31 10:52:38.080053
========= sending heartbeat at 2023-10-31 10:52:48.099093
========= sending heartbeat at 2023-10-31 10:52:58.118378
========= sending heartbeat at 2023-10-31 10:53:08.137987
========= sending heartbeat at 2023-10-31 10:53:18.157815
========= sending heartbeat at 2023-10-31 10:53:28.186291
========= sending heartbeat at 2023-10-31 10:53:38.204046
========= sending heartbeat at 2023-10-31 10:53:48.221146
========= sending heartbeat at 2023-10-31 10:53:58.239321
========= sending heartbeat at 2023-10-31 10:54:08.247802
========= sending heartbeat at 2023-10-31 10:54:18.259887
========= sending heartbeat at 2023-10-31 10:54:28.278534
========= sending heartbeat at 2023-10-31 10:54:38.296517
========= sending heartbeat at 2023-10-31 10:54:48.314255
========= sending heartbeat at 2023-10-31 10:54:58.331903
========= sending heartbeat at 2023-10-31 10:55:08.349312
========= sending heartbeat at 2023-10-31 10:55:18.366914
========= sending heartbeat at 2023-10-31 10:55:28.385415
========= sending heartbeat at 2023-10-31 10:55:38.403017
========= sending heartbeat at 2023-10-31 10:55:48.421798
========= sending heartbeat at 2023-10-31 10:55:58.439975
========= sending heartbeat at 2023-10-31 10:56:08.458458
========= sending heartbeat at 2023-10-31 10:56:18.476692
========= sending heartbeat at 2023-10-31 10:56:28.496260
========= sending heartbeat at 2023-10-31 10:56:38.514145
========= sending heartbeat at 2023-10-31 10:56:48.532449
========= sending heartbeat at 2023-10-31 10:56:58.551769
========= sending heartbeat at 2023-10-31 10:57:08.569249
========= sending heartbeat at 2023-10-31 10:57:18.586994
========= sending heartbeat at 2023-10-31 10:57:28.604515
========= sending heartbeat at 2023-10-31 10:57:38.622787
========= sending heartbeat at 2023-10-31 10:57:48.640175
========= sending heartbeat at 2023-10-31 10:57:58.661828
========= sending heartbeat at 2023-10-31 10:58:08.679200
========= sending heartbeat at 2023-10-31 10:58:18.698048
========= sending heartbeat at 2023-10-31 10:58:28.716565
========= sending heartbeat at 2023-10-31 10:58:38.730264
[EXTERN got get_interactive_info 2023-10-31 09:58:44.522762 ]
[EXTERN done get_interactive_info 2023-10-31 09:58:44.765442 0.24s ]
[EXTERN got get_micrograph_data 2023-10-31 09:58:44.915531 ]
[EXTERN done get_micrograph_data 2023-10-31 09:58:44.915837 0.00s ]
[EXTERN got get_picks 2023-10-31 09:58:44.918996 ]
[EXTERN done get_picks 2023-10-31 09:58:44.919937 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.283615 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.330519 0.05s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.398306 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.448226 0.05s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.464359 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.466685 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.527293 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.573605 0.05s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.858806 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.861339 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.995330 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.998080 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:46.059475 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:46.089388 0.03s ]
========= sending heartbeat at 2023-10-31 10:58:48.748289
========= sending heartbeat at 2023-10-31 10:58:58.767768
========= sending heartbeat at 2023-10-31 10:59:08.786761
[EXTERN got set_thresholds 2023-10-31 09:59:08.877161 ]
[EXTERN done set_thresholds 2023-10-31 09:59:11.122239 2.25s ]
[EXTERN got set_thresholds 2023-10-31 09:59:11.571856 ]
[EXTERN done set_thresholds 2023-10-31 09:59:13.816225 2.24s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.817992 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.862571 0.04s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.863983 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.925324 0.06s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.926636 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.958346 0.03s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.959949 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.986636 0.03s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.987889 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.031717 0.04s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.140989 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.172639 0.03s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.459377 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.496288 0.04s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.526636 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.566298 0.04s ]
[EXTERN got get_micrograph_data 2023-10-31 09:59:14.782171 ]
[EXTERN done get_micrograph_data 2023-10-31 09:59:17.591755 2.81s ]
[EXTERN got get_picks 2023-10-31 09:59:17.594700 ]
[EXTERN done get_picks 2023-10-31 09:59:17.595193 0.00s ]
========= sending heartbeat at 2023-10-31 10:59:18.805036
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.816220 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.854600 0.04s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.874587 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.919613 0.05s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.945011 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.986319 0.04s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.009776 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.011685 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.290889 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.292872 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.366548 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.405218 0.04s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.425896 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.475914 0.05s ]
[EXTERN got set_thresholds 2023-10-31 09:59:27.729293 ]
========= sending heartbeat at 2023-10-31 10:59:28.827227
[EXTERN done set_thresholds 2023-10-31 09:59:29.403784 1.67s ]
========= sending heartbeat at 2023-10-31 10:59:38.839832
[EXTERN got set_thresholds 2023-10-31 09:59:42.672838 ]
[EXTERN done set_thresholds 2023-10-31 09:59:45.146809 2.47s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.149557 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.152837 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.154293 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.155131 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.156230 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.179111 0.02s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.180320 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.182076 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.183126 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.184597 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.185493 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.201396 0.02s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.202326 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.224994 0.02s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.226104 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.227778 0.00s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.228768 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.254583 0.03s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.255600 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.275286 0.02s ]
[EXTERN got get_micrograph_data 2023-10-31 09:59:45.276616 ]
[EXTERN done get_micrograph_data 2023-10-31 09:59:45.809145 0.53s ]
[EXTERN got get_picks 2023-10-31 09:59:45.811438 ]
[EXTERN done get_picks 2023-10-31 09:59:45.811893 0.00s ]
========= sending heartbeat at 2023-10-31 10:59:48.858797
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.168739 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.189805 0.02s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.226858 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.245301 0.02s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.293382 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.316625 0.02s ]
[EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.359252 ]
[EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.387986 0.03s ]
[EXTERN got set_thresholds 2023-10-31 09:59:55.925738 ]
[EXTERN done set_thresholds 2023-10-31 09:59:57.635211 1.71s ]
========= sending heartbeat at 2023-10-31 10:59:58.880603
[EXTERN got set_thresholds 2023-10-31 09:59:59.723958 ]
[EXTERN done set_thresholds 2023-10-31 10:00:01.617290 1.89s ]
[EXTERN got shutdown_interactive 2023-10-31 10:00:04.306366 ]
[EXTERN done shutdown_interactive 2023-10-31 10:00:04.306459 0.00s ]
========= sending heartbeat at 2023-10-31 11:00:08.899370
========= sending heartbeat at 2023-10-31 11:00:18.919546
========= main process now complete at 2023-10-31 11:00:25.672826.
========= monitor process now complete at 2023-10-31 11:00:28.487879.

vincent · October 31, 2023, 3:50pm

information taken, there are 8Go of RAM

wtempel · October 31, 2023, 4:00pm

Exposure curation runs on the same host as all the master processes and should be subject to the same memory restrictions. Is your CryoSPARC instance itself (rather than the CryoSPARC processing jobs launched by the instance) a cluster job?

vincent · October 31, 2023, 9:46pm

Well, I’m going to try to translate here as I’m far from my knowledge zone.
If we get the translation right, “the cryosparc instance is a dedicated machine, Thus the instance is not a cluster job”.
Does this make sense to you?

wtempel · November 7, 2023, 11:10pm

As of CryoSPARC v4, the minimum RAM for the CryoSPARC master host is 16 GB. More RAM is recommended and may be needed, depending on the size of datasets, the number of concurrent interactive jobs and/or the overall workload of the server.
The computer’s admin may check the system log for an OOM message that coincides with the Particle Curation job’s failure.
By the way, the topic’s title refers to exposure curation. Would you like to change the title?

vincent · November 8, 2023, 2:50pm

Hi @wtempel ,
I confirm that switching the master host to 16Go RAM fixes the problem. (together with an upgrade to v4.3.1).
About the title, it was to follow up on the same error message as depicted in the link above, but if you feel it more appropriate to change it please go ahead.
Best
Vincent