I found that maybe the something with the update could have gone wrong so I tried to force update and received this error:
CryoSPARC current version v3.2.0
update starting on Thu Apr 1 12:15:39 PDT 2021
No version specified - updating to latest version.
=============================
Forcing update to version v3.2.0…
CryoSPARC is not already running.
If you would like to restart, use cryosparcm restart
Removing previous downloads…
Downloading master update…
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
100 785M 100 785M 0 0 5326k 0 0:02:30 0:02:30 --:–:-- 4152k
Downloading worker update…
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0
100 1807M 100 1807M 0 0 10.9M 0 0:02:45 0:02:45 --:–:-- 16.7M
Done.
Update will now be applied to the master installation,
followed by worker installations on other nodes.
Deleting old files…
Extracting…
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
The updated version will still start normally (as v 3.2.0)
Hi there,
I am representing users from research computing in a major university in California.
I want to chime in as we are experiencing the same issue, with the same versions of software:
Centos 7, cryoSPARC 3.2.0, NVIDIA driver 460.67, CUDA version 11.2.
The output in the main job window in a web browser:
[CPU: 6.10 GB] Particles selected : 4798732
[CPU: 6.10 GB] Particles excluded : 9319942
[CPU: 6.11 GB] Done.
[CPU: 6.11 GB] Interactive backend shutting down.
[CPU: 4.39 GB] --------------------------------------------------------------
[CPU: 4.39 GB] Compiling job outputs...
[CPU: 4.39 GB] Passing through outputs for output group particles_selected from input group particles
[CPU: 6.60 GB] This job outputted results ['blob', 'alignments2D']
[CPU: 6.60 GB] Loaded output dset with 4798732 items
[CPU: 6.60 GB] Passthrough results ['ctf', 'location', 'pick_stats']
[CPU: 11.86 GB] Loaded passthrough dset with 14118674 items
[CPU: 10.75 GB] Intersection of output and passthrough has 4798732 items
[CPU: 10.75 GB] Passing through outputs for output group particles_excluded from input group particles
[CPU: 10.78 GB] This job outputted results ['blob', 'alignments2D']
[CPU: 10.78 GB] Loaded output dset with 9319942 items
[CPU: 10.78 GB] Passthrough results ['ctf', 'location', 'pick_stats']
[CPU: 55.9 MB] ====== Job process terminated abnormally.
However job.log file does not say anything useful:
================= CRYOSPARCW ======= 2021-09-01 12:09:20.473610 =========
Project P24 Job J472
Master xxx.xxx.xxx Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 47428
========= monitor process now waiting for main process
MAIN PID 47428
select2D.run cryosparc_compute.jobs.jobregister
========= sending heartbeat
***************************************************************
INTERACTIVE JOB STARTED === 2021-09-01 12:09:36.769559 ==========================
========= sending heartbeat
* Serving Flask app "select_2D" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
========= sending heartbeat
========= main process now complete.
========= monitor process now complete.
I see @stephan providing lots of help - can you help us?
Thanks for reporting. It seems like you’re processing a lot of particles. Could the master node be running out of memory and terminating the python process? Try monitoring RAM usage while this job is completing- the join operation at the end of the job (which is where your job seems to have died) could be the culprit here.
Thank you, @stephan. I really appreciate your help and advice.
In fact, I already monitored the memory usage roughly with “top”, and the process used at most 20GB of RAM which is well within the 256GB available. After reaching 20.2GB, the main process died.
Is there a way to make cryosparc produce more output to pinpoint the exact moment and debug the issue easier?
It would be amazing if we made the Cryosparc tackle that dataset, as it is one of many such datasets to come.
Has this been solved? I am getting the same ‘Job process terminated abnormally’ for Inspect Particle Picks and can’t move forward. There’s no specific error.
AWS Linux 2 (Karoo), CryoSparc v3.3.1, NVIDIA 460.73.01, CUDA 11.3
Log:
[CPU: 3.00 GB] ==== Completed. Extracted 441808 particles.
[CPU: 3.00 GB] Interactive backend shutting down.
[CPU: 2.91 GB] --------------------------------------------------------------
[CPU: 2.91 GB] Compiling job outputs…
[CPU: 2.91 GB] Passing through outputs for output group micrographs from input group micrographs
[CPU: 2.91 GB] This job outputted results [‘micrograph_blob’]
[CPU: 2.91 GB] Loaded output dset with 4828 items
[CPU: 2.91 GB] Passthrough results [‘ctf’, ‘mscope_params’, ‘background_blob’, ‘movie_blob’, ‘ctf_stats’, ‘rigid_motion’, ‘spline_motion’, ‘micrograph_blob_non_dw’, ‘micrograph_thumbnail_blob_1x’, ‘micrograph_thumbnail_blob_2x’, ‘gain_ref_blob’]
[CPU: 2.92 GB] Loaded passthrough dset with 4828 items
[CPU: 2.92 GB] Intersection of output and passthrough has 4828 items
[CPU: 2.92 GB] Passing through outputs for output group particles from input group particles
[CPU: 3.00 GB] This job outputted results [‘location’]
[CPU: 3.00 GB] Loaded output dset with 441808 items
[CPU: 3.00 GB] Passthrough results [‘pick_stats’, ‘ctf’]
[CPU: 57.0 MB] ====== Job process terminated abnormally.
@iphan I do not know what caused the termination of this job. Does the output of cryosparcm joblog <project_id> <job_id>
provide additional details?
Is it possible that the job was terminated for exceeding physical or administrative (such as ulimit, cgroups, hypervisor) memory limits?
My previous post shows that picking a smaller number of particles is now failing, i.e. 229K (failed job) vs 310K (previous successful job). Both Inspect Particle Picks jobs were run on the same AWS instance type: master_instance_type = c5.xlarge
Please can you explain why increasing DRAM would help? I’d like to understand the logic of doing this.
It is not straightforward (and slooow!) for me to do the backup, increase DRAM and redeploy my entire setup on AWS.
Increasing DRAM to 16 GB did it. FYI I am testing on the same set of micrographs, extracting the same number of particles. I don’t understand why suddenly we need twice the memory.
Moving to larger RAM doubles the cost, so I would really appreciate it if you could explain why we need more RAM to do the same job?
Is there a way to optimize cryosparc to get back to where it needed less resources?