Hello,
Could you provide assistance on the stated topic title, give insight?
Context:
- I had been able to successfully process tiff files to refined map_sharp MRC files. Two or three, no issues.
- Decided to restart job pipeline at Extract Mic job because structure needed a larger box size.
- That job failed at refinement job. Event log stated either “Job ended abnormally,” “no Heartbeat,” or “Memmory [something I don’t recall]” at end (see below)
- Tried several times, this time reducing the larger box size to up to 70% box size (Through Fourier size) but got same messages again. I decided to delete all data and directories and start again. This, upon noticing that our data storage allocation was close to the limit and my cryosparc files could free up space a significant amount of it.
- I started new projects from the same tiff files directories and still got the same failed job messages as above but this time they would be presented early in the data process at the Patch motion correction job. The job would either successfully process varying fractions of the total 1500 files and sometimes would call the job complete but would leave a large portion of them not processed.
- Our IT department increased “default heartbeat interval” from 30s to 3 minutes, stated that there was slow down on computing resources the same day #3 above was run, and also indicated a software error: “3567188 Bus error/ Software error/ memmory access violation” I have been doing # 5 above for past two days. Same result.
Can you help?
Patch motion correction failed Event logs:
Event Log**
[2024-02-23 12:54:51.84]
-------- Cluster job status at 2024-02-23 12:55:22.395365 (3 retries)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
24501079 gpu cryospar rc-svc-p R 0:31 1 c38a-s5
[2024-02-23 12:55:24.93]
[CPU: 145.6 MB]
====== Job process terminated abnormally.
Event Log**
[2024-02-24 16:30:09.26]
**** Kill signal sent by CryoSPARC (ID: ) ****
[2024-02-24 16:31:17.69]
Job is unresponsive - no heartbeat received in 300 seconds.