Error during patch motion correction of movies in cryosparc v2.12.4

I get a strange error (see below) when I do the patch motion correction:

Traceback (most recent call last): File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook run_old(*args, **kw)
File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs)
File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work work = processor.process(item)
File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.
process AssertionError: Job is not in running state - worker thread with PID 33956 terminating self

Any ideas?
I tried motioncorr in case there is a problem with my instal or worker-master connection but that worked fine

Best,
Omid

Hi @Omid,

The updates to Patch Motion Correction and Patch CTF Estimation include a system to determine if any of the dispatched processes have failed/died. This traceback ensures that you don’t waste time waiting for threads that don’t exist by outputting partial results right before terminating. There could be a number of reasons why a process fails: resources, incorrect data types, network issues, etc.

Hi @stephan,

Thank you for the quick reply. I processed this data (exact same movies) before the update and had no problems with patch motion correction. So I am not sure what will cause this error other than the update. I there a way to narrow down the problem? Currently there is no one else using the workstation so I can’t imagine this is a resource issue either.

Hi,

I updated from 2.12.0 to 2.12.4. Patch CTF estimation could work fine, but Patch Motion Correction still went wrong with the same message as above.

@ljw and @Omid, is this the only traceback in the job log? if you scroll up (or “show from top” and scroll down) are there any other tracebacks? The patch CTF and patch motion jobs are multithreaded multi-process jobs and so the different threads/processes can have tracebacks at different times. The last traceback you see is one where one of the processes notices that the main process has died for some reason, but the traceback that actually gives a clue about why the main process died will be further up in the log.

Can you also paste the output from the job log for the failed job? This can be found using the following command:
cryosparcm joblog PXX JXX
this is the stderr output of the job that does not get sent to the web UI.

@apunjani Hi Ali,
Thanks for the feedback. Please see the job log below.

MAIN PID 118291
motioncorrection.run_patch cryosparc2_compute.jobs.jobregister


Running job on hostname %s XXX
Allocated Resources : {u’lane’: u’default’, u’target’: {u’monitor_port’: None, u’lane’: u’default’, u’name’: XXX, u’title’: u’Worker node XXX, u’resource_slots’: {u’GPU’: [0, 1, 2, 3], u’RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], u’CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, u’hostname’: XXX, u’worker_bin_path’: u’/data/CRYOSPARC/cryosparc2_worker/bin/cryosparcw’, u’cache_path’: u’/scr/cryosparc_cache’, u’cache_quota_mb’: None, u’resource_fixed’: {u’SSD’: True}, u’cache_reserve_mb’: 10000, u’type’: u’node’, u’ssh_str’: u’cryosparc_user@XXX’, u’desc’: None}, u’license’: True, u’hostname’: XXX, u’slots’: {u’GPU’: [0, 1, 2, 3], u’RAM’: [0, 1, 2, 3, 4, 5, 6, 7], u’CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]}, u’fixed’: {u’SSD’: False}, u’lane_type’: u’default’, u’licenses_acquired’: 4}
========= main process now complete.
========= monitor process now complete.
**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1490, in run_with_except_hook
run_old(*args, **kw)
File “/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “cryosparc2_compute/jobs/pipeline.py”, line 153, in thread_work
work = processor.process(item)
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118327 terminating self.
**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1490, in run_with_except_hook
run_old(*args, **kw)
File “/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “cryosparc2_compute/jobs/pipeline.py”, line 153, in thread_work
work = processor.process(item)
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118325 terminating self.
**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1490, in run_with_except_hook
run_old(*args, **kw)
File “/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “cryosparc2_compute/jobs/pipeline.py”, line 153, in thread_work
work = processor.process(item)
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118326 terminating self.
**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
File “cryosparc2_compute/jobs/runcommon.py”, line 1490, in run_with_except_hook
run_old(*args, **kw)
File “/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py”, line 754, in run
self.__target(*self.__args, **self.__kwargs)
File “cryosparc2_compute/jobs/pipeline.py”, line 153, in thread_work
work = processor.process(item)
File “cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py”, line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118328 terminating self.

Hi @Omid thanks for the quick response - does this mean that there were no other tracebacks in the web UI?

Did the job fail immediately, or after processing some number of movies?

Hi @apunjani,

The process fails immediately and there is no other tracebacks unfortunately

@Omid can you paste the entire joblog, you will have to find the job directory for the failed job, and find the file called job.log (it will have more lines preceding those that you pasted already)
Thanks!

Hi @apunjani

The entire log:

================= CRYOSPARCW =======  
2019-12-05 09:52:17.452951  =========
Project P1 Job J18
Master c105445 Port 39002
===========================================================================
========= monitor process now starting main process
MAINPROCESS PID 118291
========= monitor process now waiting for main process
MAIN PID 118291
motioncorrection.run_patch cryosparc2_compute.jobs.jobregister
***************************************************************
Running job on hostname %s c105445
Allocated Resources :  {u'lane': u'default', u'target': {u'monitor_port': None, u'lane': u'default', u'name': u'c105445', u'title': u'Worker node c105445', u'resource_slots': {u'GPU': [0, 1, 2, 3], u'RAM': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], u'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]}, u'hostname': u'c105445', u'worker_bin_path': u'/data/CRYOSPARC/cryosparc2_worker/bin/cryosparcw', u'cache_path': u'/scr/cryosparc_cache', u'cache_quota_mb': None, u'resource_fixed': {u'SSD': True}, u'cache_reserve_mb': 10000, u'type': u'node', u'ssh_str': u'cryosparc_user@c105445', u'desc': None}, u'license': True, u'hostname': u'c105445', u'slots': {u'GPU': [0, 1, 2, 3], u'RAM': [0, 1, 2, 3, 4, 5, 6, 7], u'CPU': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]}, u'fixed': {u'SSD': False}, u'lane_type': u'default', u'licenses_acquired': 4}
========= main process now complete.
========= monitor process now complete.

**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118327 terminating self.

**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118325 terminating self.

**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118326 terminating self.

**custom thread exception hook caught something
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118328 terminating self.

Thanks - and can you also paste the entire output from the streamlog (web UI) where the traceback was, from the start of the job?

Also @Omid does the job fail when you try to run with 1 GPU?

Hi @apunjani,

Yes it fails regardless of the number of GPU I use. Looks like maybe one of the movies might be the issue? But as I said this worked before and motioncorr seems to handle it just fine.

Here is the web UI stream log:

License is valid.

Launching job on lane default target c105445 ...

Running job on master node hostname c105445

Project P1 Job J18 Started

Master running v2.12.4, worker running v2.12.4

Running on lane default

Resources allocated: 

  Worker:  c105445

  CPU   :  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]

  GPU   :  [0, 1, 2, 3]

  RAM   :  [0, 1, 2, 3, 4, 5, 6, 7]

  SSD   :  False

--------------------------------------------------------------

Importing job module for job type patch_motion_correction_multi...

Job ready to run

***************************************************************

Job will process this many movies:  1358

parent process is 118291

Calling CUDA init from 118325

Calling CUDA init from 118326

Calling CUDA init from 118327

Calling CUDA init from 118328

====== Job process terminated abnormally.

-- 2.0: processing 0 of 1358: J1/imported/FoilHole_5943322_Data_5943549_5943550_20190907_1209_Fractions.mrc
        loading /data/CRYOSPARC/USERS/Omid/P1/J1/imported/FoilHole_5943322_Data_5943549_5943550_20190907_1209_Fractions.mrc
        Loading raw movie data from J1/imported/FoilHole_5943322_Data_5943549_5943550_20190907_1209_Fractions.mrc ...
        Done in 1.01s
        Processing ...
        Done in 8.99s
        Completed rigid and patch motion with (Z:5,Y:7,X:7) knots
        Writing non-dose-weighted result to J18/motioncorrected/FoilHole_5943322_Data_5943549_5943550_20190907_1209_Fractions_patch_aligned.mrc ...
        Done in 0.07s
        Writing dose-weighted result to J18/motioncorrected/FoilHole_5943322_Data_5943549_5943550_20190907_1209_Fractions_patch_aligned_doseweighted.mrc ...
        Done in 0.07s
        Writing background estimate to J18/motioncorrected/FoilHole_5943322_Data_5943549_5943550_20190907_1209_Fractions_background.mrc ...
        Done in 0.01s
        Writing motion estimates...
        Done in 0.00s

-- 0.0: processing 1 of 1358: J1/imported/FoilHole_5943323_Data_5943549_5943550_20190907_1212_Fractions.mrc
        loading /data/CRYOSPARC/USERS/Omid/P1/J1/imported/FoilHole_5943323_Data_5943549_5943550_20190907_1212_Fractions.mrc
        Loading raw movie data from J1/imported/FoilHole_5943323_Data_5943549_5943550_20190907_1212_Fractions.mrc ...
        Done in 4.68s
        Processing ...
        Done in 9.64s
        Completed rigid and patch motion with (Z:5,Y:7,X:7) knots
        Writing non-dose-weighted result to J18/motioncorrected/FoilHole_5943323_Data_5943549_5943550_20190907_1212_Fractions_patch_aligned.mrc ...
        Done in 0.07s
        Writing dose-weighted result to J18/motioncorrected/FoilHole_5943323_Data_5943549_5943550_20190907_1212_Fractions_patch_aligned_doseweighted.mrc ...
        Done in 0.07s
        Writing background estimate to J18/motioncorrected/FoilHole_5943323_Data_5943549_5943550_20190907_1212_Fractions_background.mrc ...
        Done in 0.01s
        Writing motion estimates...
        Done in 0.00s

-- 3.0: processing 2 of 1358: J1/imported/FoilHole_5943325_Data_5943549_5943550_20190907_1214_Fractions.mrc
        loading /data/CRYOSPARC/USERS/Omid/P1/J1/imported/FoilHole_5943325_Data_5943549_5943550_20190907_1214_Fractions.mrc
        Loading raw movie data from J1/imported/FoilHole_5943325_Data_5943549_5943550_20190907_1214_Fractions.mrc ...
        Done in 8.52s
        Processing ...
        Done in 9.33s
        Completed rigid and patch motion with (Z:5,Y:7,X:7) knots
        Writing non-dose-weighted result to J18/motioncorrected/FoilHole_5943325_Data_5943549_5943550_20190907_1214_Fractions_patch_aligned.mrc ...
        Done in 0.09s
        Writing dose-weighted result to J18/motioncorrected/FoilHole_5943325_Data_5943549_5943550_20190907_1214_Fractions_patch_aligned_doseweighted.mrc ...
        Done in 0.06s
        Writing background estimate to J18/motioncorrected/FoilHole_5943325_Data_5943549_5943550_20190907_1214_Fractions_background.mrc ...
        Done in 0.01s
        Writing motion estimates...
        Done in 0.00s

-- 1.0: processing 3 of 1358: J1/imported/FoilHole_5943327_Data_5943549_5943550_20190907_1217_Fractions.mrc
        loading /data/CRYOSPARC/USERS/Omid/P1/J1/imported/FoilHole_5943327_Data_5943549_5943550_20190907_1217_Fractions.mrc
        Loading raw movie data from J1/imported/FoilHole_5943327_Data_5943549_5943550_20190907_1217_Fractions.mrc ...
        Done in 8.44s
        Processing ...
        Done in 9.26s
        Completed rigid and patch motion with (Z:5,Y:7,X:7) knots
        Writing non-dose-weighted result to J18/motioncorrected/FoilHole_5943327_Data_5943549_5943550_20190907_1217_Fractions_patch_aligned.mrc ...
        Done in 0.07s
        Writing dose-weighted result to J18/motioncorrected/FoilHole_5943327_Data_5943549_5943550_20190907_1217_Fractions_patch_aligned_doseweighted.mrc ...
        Done in 0.06s
        Writing background estimate to J18/motioncorrected/FoilHole_5943327_Data_5943549_5943550_20190907_1217_Fractions_background.mrc ...
        Done in 0.01s
        Writing motion estimates...
        Done in 0.00s

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118327 terminating self.

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118325 terminating self.

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118326 terminating self.

Traceback (most recent call last):
  File "cryosparc2_compute/jobs/runcommon.py", line 1490, in run_with_except_hook
    run_old(*args, **kw)
  File "/data/CRYOSPARC/cryosparc2_worker/deps/anaconda/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "cryosparc2_compute/jobs/pipeline.py", line 153, in thread_work
    work = processor.process(item)
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 252, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
AssertionError: Job is not in running state - worker thread with PID 118328 terminating self.

HI @Omid, thanks again for your help. We can’t seem to reproduce this at all on our end, or get to the bottom of it. I will email you to ask for a time that we can set up a webex call if possible, to diagnose it directly on your system.
Thanks!

1 Like

We’ve identified the bug - thanks @Omid for all the help. fix to be released asap.

1 Like

Hi @apunjani,

My pleasure. Thanks for fixing it.

Hi @Omid,

We’ve released this bug in the current release- meaning there isn’t a new version number for this. In order to get the fix, you’re going to have to force-reinstall cryoSPARC (which will reinstall your cryoSPARC instance after pulling the latest version from our servers).

More info here:
https://cryosparc.com/docs/reference/install/#forced-update

Hi @stephan,

Thanks!