strange error in particle curation job. The job is unstable, it ran well on a small subset (100k particles) but for the large dataset (6M particles) it sends an error message after NCC and power estimation about interactive display of images, similar to what was described here: Interactive exposure curation failing
However, I don’t have the FileNotFound error described in the previous post, instead just browsing out of the interactive tab and back in resolved the issue and allowed me to change the thresholds. I thought that was it, but when I click “Done output particles”, the job starts but ends abnormally and doesn’t output particles.
Can you help me solve this issue?
Thanks
Vincent
(cryosparc 4.2.1)
The end of the log is here:
[CPU: 2.06 GB Avail: 3.53 GB]
==== Completed. Extracted 350089 particles.
[CPU: 2.06 GB Avail: 3.53 GB]
Interactive backend shutting down.
[CPU: 2.06 GB Avail: 3.53 GB]
--------------------------------------------------------------
[CPU: 2.06 GB Avail: 3.53 GB]
Compiling job outputs...
[CPU: 2.06 GB Avail: 3.53 GB]
Passing through outputs for output group micrographs from input group micrographs
[CPU: 2.06 GB Avail: 3.53 GB]
This job outputted results ['micrograph_blob']
[CPU: 2.06 GB Avail: 3.53 GB]
Loaded output dset with 2460 items
[CPU: 2.06 GB Avail: 3.53 GB]
Passthrough results ['ctf', 'mscope_params', 'background_blob', 'micrograph_thumbnail_blob_1x', 'micrograph_thumbnail_blob_2x', 'movie_blob', 'ctf_stats', 'rigid_motion', 'spline_motion', 'micrograph_blob_non_dw', 'gain_ref_blob']
[CPU: 2.06 GB Avail: 3.53 GB]
Loaded passthrough dset with 2460 items
[CPU: 2.06 GB Avail: 3.53 GB]
Intersection of output and passthrough has 2460 items
[CPU: 2.06 GB Avail: 3.53 GB]
Passing through outputs for output group particles from input group particles
[CPU: 2.06 GB Avail: 3.53 GB]
This job outputted results ['location']
[CPU: 2.06 GB Avail: 3.53 GB]
Loaded output dset with 350089 items
[CPU: 2.06 GB Avail: 3.53 GB]
Passthrough results ['pick_stats', 'ctf']
[CPU: 122.6 MB Avail: 5.59 GB]
====== Job process terminated abnormally.
How much RAM does your CryoSPARC master computer have?
Are there additional error messages in the job log (under Metadata|Log)?
Do system logs indicate that the job was terminated by the kernel’s out-of-memory manager?
It’s a big cluster, I’ll inquire about the RAM but I don’t think it’s the problem.
No error message in the log either, here it is:
===========================================================================
========= monitor process now starting main process at 2023-10-31 10:43:13.571409
MAINPROCESS PID 950692
========= monitor process now waiting for main process
MAIN PID 950692
interactive.run_inspect_picks_v2 cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2023-10-31 10:43:47.105117
INTERACTIVE JOB STARTED === 2023-10-31 10:43:50.777616 ==========================
========= sending heartbeat at 2023-10-31 10:43:57.113093
========= sending heartbeat at 2023-10-31 10:44:07.133735
========= sending heartbeat at 2023-10-31 10:44:17.151222
========= sending heartbeat at 2023-10-31 10:44:27.179682
========= sending heartbeat at 2023-10-31 10:44:37.200386
========= sending heartbeat at 2023-10-31 10:44:47.223504
========= sending heartbeat at 2023-10-31 10:44:57.242058
========= sending heartbeat at 2023-10-31 10:45:07.260964
========= sending heartbeat at 2023-10-31 10:45:17.276600
========= sending heartbeat at 2023-10-31 10:45:27.292307
========= sending heartbeat at 2023-10-31 10:45:37.308744
========= sending heartbeat at 2023-10-31 10:45:47.326507
========= sending heartbeat at 2023-10-31 10:45:57.346060
========= sending heartbeat at 2023-10-31 10:46:07.367602
========= sending heartbeat at 2023-10-31 10:46:17.393509
========= sending heartbeat at 2023-10-31 10:46:27.412028
========= sending heartbeat at 2023-10-31 10:46:37.430234
========= sending heartbeat at 2023-10-31 10:46:47.442748
========= sending heartbeat at 2023-10-31 10:46:57.465992
========= sending heartbeat at 2023-10-31 10:47:07.490672
========= sending heartbeat at 2023-10-31 10:47:17.509236
Exposure curation runs on the same host as all the master processes and should be subject to the same memory restrictions. Is your CryoSPARC instance itself (rather than the CryoSPARC processing jobs launched by the instance) a cluster job?
Well, I’m going to try to translate here as I’m far from my knowledge zone.
If we get the translation right, “the cryosparc instance is a dedicated machine, Thus the instance is not a cluster job”.
Does this make sense to you?
As of CryoSPARC v4, the minimum RAM for the CryoSPARC master host is 16 GB. More RAM is recommended and may be needed, depending on the size of datasets, the number of concurrent interactive jobs and/or the overall workload of the server.
The computer’s admin may check the system log for an OOM message that coincides with the Particle Curation job’s failure.
By the way, the topic’s title refers to exposure curation. Would you like to change the title?
Hi @wtempel ,
I confirm that switching the master host to 16Go RAM fixes the problem. (together with an upgrade to v4.3.1).
About the title, it was to follow up on the same error message as depicted in the link above, but if you feel it more appropriate to change it please go ahead.
Best
Vincent