Local motion correction fails and I can't see why

closed

(William Grant Ludlam) #1

When I try to run a local motion correction job with my full data set (2422 micrographs), my job eventually fails. It is never in the same exact spot, but it is usually around 850 micrographs, 4000 seconds. If I run jobs of <800 micrographs, the job can finish. Any ideas of how I can get the job to finish with the full set. I can’t see from the output or logs what the issue might be:

Job Overview, follow latest

Processed 850 of 2422 movies in 4772.71s 

(864 of 2422) Processing J104/imported/Aug09_07.47.11.mrc
Motion correcting and extracting 9 particles (9 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.47.11.mrc ...Done in 1.88s

Processing ...Done in 1.37s

  Curvature  1.52581e+14 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.06s

(865 of 2422) Processing J104/imported/Aug09_07.48.27.mrc
Motion correcting and extracting 80 particles (13 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.48.27.mrc ...Done in 1.89s

Processing ...Done in 2.22s

  Curvature  7.07935e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.45s

(866 of 2422) Processing J104/imported/Aug09_07.49.42.mrc
Motion correcting and extracting 6 particles (6 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.49.42.mrc ...Done in 1.86s

Processing ...Done in 1.35s

  Curvature  8.81582e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.04s

(867 of 2422) Processing J104/imported/Aug09_07.50.56.mrc
Motion correcting and extracting 84 particles (16 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.50.56.mrc ...Done in 1.80s

Processing ...Done in 2.09s

  Curvature  8.6355e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.50s

(868 of 2422) Processing J104/imported/Aug09_07.52.08.mrc
Motion correcting and extracting 78 particles (17 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.52.08.mrc ...Done in 2.12s

Processing ...Done in 2.19s

  Curvature  5.73468e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 1.58s

(869 of 2422) Processing J104/imported/Aug09_07.53.19.mrc
Motion correcting and extracting 42 particles (13 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.53.19.mrc ...Done in 2.32s

Processing ...Done in 1.78s

  Curvature  5.0885e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.24s

(870 of 2422) Processing J104/imported/Aug09_07.54.31.mrc
Motion correcting and extracting 81 particles (20 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.54.31.mrc ...Done in 2.37s

Processing ...Done in 2.17s

  Curvature  5.9319e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 1.38s

(871 of 2422) Processing J104/imported/Aug09_07.56.00.mrc
Motion correcting and extracting 96 particles (23 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.56.00.mrc ...Done in 2.07s

Processing ...Done in 2.24s

  Curvature  7.59e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.51s

(872 of 2422) Processing J104/imported/Aug09_07.57.08.mrc
Motion correcting and extracting 73 particles (17 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.57.08.mrc ...Done in 1.75s

Processing ...Done in 1.99s

  Curvature  7.31061e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 1.25s

(873 of 2422) Processing J104/imported/Aug09_07.58.29.mrc
Motion correcting and extracting 122 particles (34 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.58.29.mrc ...Done in 3.81s

Processing ...Done in 2.46s

  Curvature  1.0665e+14 Smooth lambda cal  20.0

  Writing out particles..  Done in 1.58s

(874 of 2422) Processing J104/imported/Aug09_07.59.45.mrc
Motion correcting and extracting 110 particles (31 rejected near edges)

Loading raw movie data from J104/imported/Aug09_07.59.45.mrc ...Done in 2.43s

Processing ...Done in 2.30s

  Curvature  1.24052e+14 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.63s

(875 of 2422) Processing J104/imported/Aug09_08.01.00.mrc
Motion correcting and extracting 138 particles (30 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.01.00.mrc ...Done in 2.19s

Processing ...Done in 2.75s

  Curvature  7.24537e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 2.56s

(876 of 2422) Processing J104/imported/Aug09_08.02.15.mrc
Motion correcting and extracting 107 particles (31 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.02.15.mrc ...Done in 2.59s

Processing ...Done in 2.53s

  Curvature  1.47105e+14 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.61s

(877 of 2422) Processing J104/imported/Aug09_08.03.31.mrc
Motion correcting and extracting 123 particles (20 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.03.31.mrc ...Done in 2.55s

Processing ...Done in 2.49s

  Curvature  1.17303e+14 Smooth lambda cal  20.0

  Writing out particles..  Done in 1.62s

(878 of 2422) Processing J104/imported/Aug09_08.04.59.mrc
Motion correcting and extracting 96 particles (20 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.04.59.mrc ...Done in 2.73s

Processing ...Done in 2.21s

  Curvature  9.37436e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.53s

(879 of 2422) Processing J104/imported/Aug09_08.06.11.mrc
Motion correcting and extracting 93 particles (20 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.06.11.mrc ...Done in 2.94s

Processing ...Done in 2.13s

  Curvature  8.61964e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 2.29s

(880 of 2422) Processing J104/imported/Aug09_08.07.22.mrc
Motion correcting and extracting 95 particles (20 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.07.22.mrc ...Done in 2.35s

Processing ...Done in 2.31s

  Curvature  6.33217e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.57s

(881 of 2422) Processing J104/imported/Aug09_08.08.37.mrc
Motion correcting and extracting 78 particles (9 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.08.37.mrc ...Done in 2.21s

Processing ...Done in 2.18s

  Curvature  4.57629e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.47s

(882 of 2422) Processing J104/imported/Aug09_08.09.49.mrc
Motion correcting and extracting 98 particles (26 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.09.49.mrc ...Done in 1.68s

Processing ...Done in 2.36s

  Curvature  5.67702e+13 Smooth lambda cal  20.0

  Writing out particles..  Done in 0.85s

(883 of 2422) Processing J104/imported/Aug09_08.11.15.mrc
Motion correcting and extracting 127 particles (28 rejected near edges)

Loading raw movie data from J104/imported/Aug09_08.11.15.mrc ...Done in 2.10s
Processing ...Done in 2.46s

jobs.log


================= CRYOSPARCW =======  2019-08-23 16:51:55.329512  =========
Project P2 Job J145
Master m9g-1-20.rc.byu.edu Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 66483
MAIN PID 66483
motioncorrection.run_local cryosparc2_compute.jobs.jobregister
========= monitor process now waiting for main process
/zhome/pianonan/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py:283: UserWarning: creating CUBLAS context to get version number
  warnings.warn('creating CUBLAS context to get version number')
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
/zhome/pianonan/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py:516: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  max_open_warning, RuntimeWarning)
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
...repeated 500 times...
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
========= main process now complete.
========= monitor process now complete.

Current cryoSPARC version: v2.9.0
CRYOSPARC_CUDA_PATH: /apps/cuda/9.2.88
OS: Ubuntu


Non-uniform refinement failure
(Jacopo Marino) #2

In my case my jobs are marked with failed, but they are still running and they finish. So check your GPU usage, and see if this is the case. I am not sure this is a bug of 2.9.


(William Grant Ludlam) #3

Yes. I have noticed that sometimes I keep getting output from the job overview even after cryosparc says the job has failed. Besides watching GPU, is there a way to see if old jobs completed or if they really failed? I’ve tried using data from “failed” jobs that I’ve marked complete but I get errors such as:

AttributeError: 'NoneType' object has no attribute 'data'

This makes me think maybe the jobs aren’t completing all the way?


(William Grant Ludlam) #4

I think the problem was that my computational setup only allowed a limited amount of CPU time so the kernel was killing my processes after they ran a while.:unamused:
I would sometimes get the error:

====== Job process terminated abnormally

I changed to running my jobs on a cluster and increased my wall-time and now the jobs run to completion.