Particle curation failing

Hi,

strange error in particle curation job. The job is unstable, it ran well on a small subset (100k particles) but for the large dataset (6M particles) it sends an error message after NCC and power estimation about interactive display of images, similar to what was described here: Interactive exposure curation failing

However, I don’t have the FileNotFound error described in the previous post, instead just browsing out of the interactive tab and back in resolved the issue and allowed me to change the thresholds. I thought that was it, but when I click “Done output particles”, the job starts but ends abnormally and doesn’t output particles.

Can you help me solve this issue?
Thanks
Vincent

(cryosparc 4.2.1)
The end of the log is here:

[CPU:   2.06 GB  Avail:   3.53 GB]
==== Completed. Extracted 350089 particles.

[CPU:   2.06 GB  Avail:   3.53 GB]
Interactive backend shutting down.

[CPU:   2.06 GB  Avail:   3.53 GB]
--------------------------------------------------------------

[CPU:   2.06 GB  Avail:   3.53 GB]
Compiling job outputs...

[CPU:   2.06 GB  Avail:   3.53 GB]
Passing through outputs for output group micrographs from input group micrographs

[CPU:   2.06 GB  Avail:   3.53 GB]
This job outputted results ['micrograph_blob']

[CPU:   2.06 GB  Avail:   3.53 GB]
  Loaded output dset with 2460 items

[CPU:   2.06 GB  Avail:   3.53 GB]
Passthrough results ['ctf', 'mscope_params', 'background_blob', 'micrograph_thumbnail_blob_1x', 'micrograph_thumbnail_blob_2x', 'movie_blob', 'ctf_stats', 'rigid_motion', 'spline_motion', 'micrograph_blob_non_dw', 'gain_ref_blob']

[CPU:   2.06 GB  Avail:   3.53 GB]
  Loaded passthrough dset with 2460 items

[CPU:   2.06 GB  Avail:   3.53 GB]
  Intersection of output and passthrough has 2460 items

[CPU:   2.06 GB  Avail:   3.53 GB]
Passing through outputs for output group particles from input group particles

[CPU:   2.06 GB  Avail:   3.53 GB]
This job outputted results ['location']

[CPU:   2.06 GB  Avail:   3.53 GB]
  Loaded output dset with 350089 items

[CPU:   2.06 GB  Avail:   3.53 GB]
Passthrough results ['pick_stats', 'ctf']

[CPU:  122.6 MB  Avail:   5.59 GB]
====== Job process terminated abnormally.

How much RAM does your CryoSPARC master computer have?
Are there additional error messages in the job log (under Metadata|Log)?
Do system logs indicate that the job was terminated by the kernel’s out-of-memory manager?

It’s a big cluster, I’ll inquire about the RAM but I don’t think it’s the problem.

No error message in the log either, here it is:

===========================================================================
========= monitor process now starting main process at 2023-10-31 10:43:13.571409
MAINPROCESS PID 950692
========= monitor process now waiting for main process
MAIN PID 950692
interactive.run_inspect_picks_v2 cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2023-10-31 10:43:47.105117


INTERACTIVE JOB STARTED === 2023-10-31 10:43:50.777616 ==========================
========= sending heartbeat at 2023-10-31 10:43:57.113093
========= sending heartbeat at 2023-10-31 10:44:07.133735
========= sending heartbeat at 2023-10-31 10:44:17.151222
========= sending heartbeat at 2023-10-31 10:44:27.179682
========= sending heartbeat at 2023-10-31 10:44:37.200386
========= sending heartbeat at 2023-10-31 10:44:47.223504
========= sending heartbeat at 2023-10-31 10:44:57.242058
========= sending heartbeat at 2023-10-31 10:45:07.260964
========= sending heartbeat at 2023-10-31 10:45:17.276600
========= sending heartbeat at 2023-10-31 10:45:27.292307
========= sending heartbeat at 2023-10-31 10:45:37.308744
========= sending heartbeat at 2023-10-31 10:45:47.326507
========= sending heartbeat at 2023-10-31 10:45:57.346060
========= sending heartbeat at 2023-10-31 10:46:07.367602
========= sending heartbeat at 2023-10-31 10:46:17.393509
========= sending heartbeat at 2023-10-31 10:46:27.412028
========= sending heartbeat at 2023-10-31 10:46:37.430234
========= sending heartbeat at 2023-10-31 10:46:47.442748
========= sending heartbeat at 2023-10-31 10:46:57.465992
========= sending heartbeat at 2023-10-31 10:47:07.490672
========= sending heartbeat at 2023-10-31 10:47:17.509236

  • Serving Flask app “inspect_picks_v2” (lazy loading)
  • Environment: production
    WARNING: This is a development server. Do not use it in a production deployment.
    Use a production WSGI server instead.
  • Debug mode: off
    [EXTERN got get_micrograph_data 2023-10-31 09:47:24.843079 ]
    ========= sending heartbeat at 2023-10-31 10:47:27.533420
    [EXTERN done get_micrograph_data 2023-10-31 09:47:33.682816 8.84s ]
    [EXTERN got get_picks 2023-10-31 09:47:33.685828 ]
    [EXTERN done get_picks 2023-10-31 09:47:33.686294 0.00s ]
    ========= sending heartbeat at 2023-10-31 10:47:37.549485
    ========= sending heartbeat at 2023-10-31 10:47:47.567151
    ========= sending heartbeat at 2023-10-31 10:47:57.584733
    ========= sending heartbeat at 2023-10-31 10:48:07.602846
    ========= sending heartbeat at 2023-10-31 10:48:17.621825
    ========= sending heartbeat at 2023-10-31 10:48:27.641267
    ========= sending heartbeat at 2023-10-31 10:48:37.655205
    ========= sending heartbeat at 2023-10-31 10:48:47.672979
    ========= sending heartbeat at 2023-10-31 10:48:57.691490
    ========= sending heartbeat at 2023-10-31 10:49:07.709914
    ========= sending heartbeat at 2023-10-31 10:49:17.727661
    ========= sending heartbeat at 2023-10-31 10:49:27.747132
    ========= sending heartbeat at 2023-10-31 10:49:37.765275
    ========= sending heartbeat at 2023-10-31 10:49:47.783329
    ========= sending heartbeat at 2023-10-31 10:49:57.802193
    ========= sending heartbeat at 2023-10-31 10:50:07.820410
    ========= sending heartbeat at 2023-10-31 10:50:17.839483
    ========= sending heartbeat at 2023-10-31 10:50:27.858189
    ========= sending heartbeat at 2023-10-31 10:50:37.876091
    ========= sending heartbeat at 2023-10-31 10:50:47.893502
    ========= sending heartbeat at 2023-10-31 10:50:57.912382
    ========= sending heartbeat at 2023-10-31 10:51:07.932951
    ========= sending heartbeat at 2023-10-31 10:51:17.951356
    ========= sending heartbeat at 2023-10-31 10:51:27.960534
    ========= sending heartbeat at 2023-10-31 10:51:37.977748
    ========= sending heartbeat at 2023-10-31 10:51:47.995968
    ========= sending heartbeat at 2023-10-31 10:51:58.015207
    ========= sending heartbeat at 2023-10-31 10:52:08.025478
    ========= sending heartbeat at 2023-10-31 10:52:18.043771
    ========= sending heartbeat at 2023-10-31 10:52:28.062699
    ========= sending heartbeat at 2023-10-31 10:52:38.080053
    ========= sending heartbeat at 2023-10-31 10:52:48.099093
    ========= sending heartbeat at 2023-10-31 10:52:58.118378
    ========= sending heartbeat at 2023-10-31 10:53:08.137987
    ========= sending heartbeat at 2023-10-31 10:53:18.157815
    ========= sending heartbeat at 2023-10-31 10:53:28.186291
    ========= sending heartbeat at 2023-10-31 10:53:38.204046
    ========= sending heartbeat at 2023-10-31 10:53:48.221146
    ========= sending heartbeat at 2023-10-31 10:53:58.239321
    ========= sending heartbeat at 2023-10-31 10:54:08.247802
    ========= sending heartbeat at 2023-10-31 10:54:18.259887
    ========= sending heartbeat at 2023-10-31 10:54:28.278534
    ========= sending heartbeat at 2023-10-31 10:54:38.296517
    ========= sending heartbeat at 2023-10-31 10:54:48.314255
    ========= sending heartbeat at 2023-10-31 10:54:58.331903
    ========= sending heartbeat at 2023-10-31 10:55:08.349312
    ========= sending heartbeat at 2023-10-31 10:55:18.366914
    ========= sending heartbeat at 2023-10-31 10:55:28.385415
    ========= sending heartbeat at 2023-10-31 10:55:38.403017
    ========= sending heartbeat at 2023-10-31 10:55:48.421798
    ========= sending heartbeat at 2023-10-31 10:55:58.439975
    ========= sending heartbeat at 2023-10-31 10:56:08.458458
    ========= sending heartbeat at 2023-10-31 10:56:18.476692
    ========= sending heartbeat at 2023-10-31 10:56:28.496260
    ========= sending heartbeat at 2023-10-31 10:56:38.514145
    ========= sending heartbeat at 2023-10-31 10:56:48.532449
    ========= sending heartbeat at 2023-10-31 10:56:58.551769
    ========= sending heartbeat at 2023-10-31 10:57:08.569249
    ========= sending heartbeat at 2023-10-31 10:57:18.586994
    ========= sending heartbeat at 2023-10-31 10:57:28.604515
    ========= sending heartbeat at 2023-10-31 10:57:38.622787
    ========= sending heartbeat at 2023-10-31 10:57:48.640175
    ========= sending heartbeat at 2023-10-31 10:57:58.661828
    ========= sending heartbeat at 2023-10-31 10:58:08.679200
    ========= sending heartbeat at 2023-10-31 10:58:18.698048
    ========= sending heartbeat at 2023-10-31 10:58:28.716565
    ========= sending heartbeat at 2023-10-31 10:58:38.730264
    [EXTERN got get_interactive_info 2023-10-31 09:58:44.522762 ]
    [EXTERN done get_interactive_info 2023-10-31 09:58:44.765442 0.24s ]
    [EXTERN got get_micrograph_data 2023-10-31 09:58:44.915531 ]
    [EXTERN done get_micrograph_data 2023-10-31 09:58:44.915837 0.00s ]
    [EXTERN got get_picks 2023-10-31 09:58:44.918996 ]
    [EXTERN done get_picks 2023-10-31 09:58:44.919937 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.283615 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.330519 0.05s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.398306 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.448226 0.05s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.464359 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.466685 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.527293 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.573605 0.05s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.858806 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.861339 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.995330 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:45.998080 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:58:46.059475 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:58:46.089388 0.03s ]
    ========= sending heartbeat at 2023-10-31 10:58:48.748289
    ========= sending heartbeat at 2023-10-31 10:58:58.767768
    ========= sending heartbeat at 2023-10-31 10:59:08.786761
    [EXTERN got set_thresholds 2023-10-31 09:59:08.877161 ]
    [EXTERN done set_thresholds 2023-10-31 09:59:11.122239 2.25s ]
    [EXTERN got set_thresholds 2023-10-31 09:59:11.571856 ]
    [EXTERN done set_thresholds 2023-10-31 09:59:13.816225 2.24s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.817992 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.862571 0.04s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.863983 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.925324 0.06s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.926636 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.958346 0.03s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.959949 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.986636 0.03s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:13.987889 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.031717 0.04s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.140989 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.172639 0.03s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.459377 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.496288 0.04s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.526636 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:14.566298 0.04s ]
    [EXTERN got get_micrograph_data 2023-10-31 09:59:14.782171 ]
    [EXTERN done get_micrograph_data 2023-10-31 09:59:17.591755 2.81s ]
    [EXTERN got get_picks 2023-10-31 09:59:17.594700 ]
    [EXTERN done get_picks 2023-10-31 09:59:17.595193 0.00s ]
    ========= sending heartbeat at 2023-10-31 10:59:18.805036
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.816220 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.854600 0.04s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.874587 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.919613 0.05s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.945011 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:22.986319 0.04s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.009776 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.011685 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.290889 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.292872 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.366548 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.405218 0.04s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.425896 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:23.475914 0.05s ]
    [EXTERN got set_thresholds 2023-10-31 09:59:27.729293 ]
    ========= sending heartbeat at 2023-10-31 10:59:28.827227
    [EXTERN done set_thresholds 2023-10-31 09:59:29.403784 1.67s ]
    ========= sending heartbeat at 2023-10-31 10:59:38.839832
    [EXTERN got set_thresholds 2023-10-31 09:59:42.672838 ]
    [EXTERN done set_thresholds 2023-10-31 09:59:45.146809 2.47s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.149557 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.152837 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.154293 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.155131 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.156230 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.179111 0.02s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.180320 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.182076 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.183126 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.184597 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.185493 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.201396 0.02s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.202326 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.224994 0.02s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.226104 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.227778 0.00s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.228768 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.254583 0.03s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.255600 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:45.275286 0.02s ]
    [EXTERN got get_micrograph_data 2023-10-31 09:59:45.276616 ]
    [EXTERN done get_micrograph_data 2023-10-31 09:59:45.809145 0.53s ]
    [EXTERN got get_picks 2023-10-31 09:59:45.811438 ]
    [EXTERN done get_picks 2023-10-31 09:59:45.811893 0.00s ]
    ========= sending heartbeat at 2023-10-31 10:59:48.858797
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.168739 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.189805 0.02s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.226858 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.245301 0.02s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.293382 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.316625 0.02s ]
    [EXTERN got get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.359252 ]
    [EXTERN done get_micrograph_data_preview_thumbnail 2023-10-31 09:59:52.387986 0.03s ]
    [EXTERN got set_thresholds 2023-10-31 09:59:55.925738 ]
    [EXTERN done set_thresholds 2023-10-31 09:59:57.635211 1.71s ]
    ========= sending heartbeat at 2023-10-31 10:59:58.880603
    [EXTERN got set_thresholds 2023-10-31 09:59:59.723958 ]
    [EXTERN done set_thresholds 2023-10-31 10:00:01.617290 1.89s ]
    [EXTERN got shutdown_interactive 2023-10-31 10:00:04.306366 ]
    [EXTERN done shutdown_interactive 2023-10-31 10:00:04.306459 0.00s ]
    ========= sending heartbeat at 2023-10-31 11:00:08.899370
    ========= sending heartbeat at 2023-10-31 11:00:18.919546
    ========= main process now complete at 2023-10-31 11:00:25.672826.
    ========= monitor process now complete at 2023-10-31 11:00:28.487879.

information taken, there are 8Go of RAM

Exposure curation runs on the same host as all the master processes and should be subject to the same memory restrictions. Is your CryoSPARC instance itself (rather than the CryoSPARC processing jobs launched by the instance) a cluster job?

Well, I’m going to try to translate here as I’m far from my knowledge zone.
If we get the translation right, “the cryosparc instance is a dedicated machine, Thus the instance is not a cluster job”.
Does this make sense to you?

As of CryoSPARC v4, the minimum RAM for the CryoSPARC master host is 16 GB. More RAM is recommended and may be needed, depending on the size of datasets, the number of concurrent interactive jobs and/or the overall workload of the server.
The computer’s admin may check the system log for an OOM message that coincides with the Particle Curation job’s failure.
By the way, the topic’s title refers to exposure curation. Would you like to change the title?

Hi @wtempel ,
I confirm that switching the master host to 16Go RAM fixes the problem. (together with an upgrade to v4.3.1).
About the title, it was to follow up on the same error message as depicted in the link above, but if you feel it more appropriate to change it please go ahead.
Best
Vincent

1 Like