Non-uniform refinement failure

dls4n · June 25, 2019, 9:06pm

cryoSPARC Version 2.5.0
Box size 320 or 330 pix.

The refinement appears to be finished with relevant volumes and masks present in the directory. But the final FSC has not been calculated and the job is flagged as “Failed” in the GUI. Other refinement jobs (Hetero and Homogeneous) work without a problem.

Here is the end of the log:

> Done iteration 6 in 4804.759s. Total time so far 22447.765s
> -- Iteration 7
>   Using Full Dataset (split 121116 in A, 121116 in B)
>   Using Max Alignment Radius 87.481 (3.786A)
>   Using dynamic mask.
>  Start local processing...
> Expand dynamic mask A and B by 16 voxels
> -- DEV 0 THR 1 NUM 30500 TOTAL 473.61932 ELAPSED 3926.1029 --
>  Newly generated random seed: 1893709422
>   Processed 242232.000 images in 3928.253s.
>   Computing Global FSCs... 
> ====== Job process terminated abnormally.

The GSFSC curve for iteration 006 is shown in the GUI along with the Guinier Plot, the Noise model plot and the two distributions (all for iteration 6). I believe that iteration 007 would be the last in which it would tighten the mask for the “final” FSC with marginally better resolution estimation.

Strangely, this job sometimes seems to restart after it encounters this error (goes back to 1st iteration of the refinement). It also seems to encounter a resource error during the restart (no CUDA device found). But other times it just stops as shown above. I think the fundamental problem is that the process terminated abnormally, though I cannot find further information about why.

Will appreciate any help or hints.
David

WGL · September 3, 2019, 9:51pm

I was running into a similar error message. I found that my kernel was killing my processes after too much CPU time.

DavdBSauer · October 17, 2019, 6:23pm

Hi all
I am getting the same error. @WGL I am unclear from your thread. Were you able to increase the kernel timeout limit? (and if so, how?) I am already running on a cluster but getting the same error.

WGL · October 17, 2019, 9:54pm

@DavdBSauer How long can do your jobs run before they die? You can request time in your submission script. For SLURM: #SBATCH --time:72:00:00 My supercomputing facility only allows up to three days of wall time, so you’ll want to find out what your resources are.