Patch CTF Job Fails: Child process with PID 31697 has terminated unexpectedly

Dear cryoSPARC users,

I am a new user for cryoSparc user, I got some problem:

License is valid.

Launching job on lane default target kumquat ...

Running job on master node hostname kumquat

Project P1 Job J30 Started

Master running v2.12.2, worker running v2.12.2

Running on lane default

Resources allocated: 

  Worker:  kumquat

  CPU   :  [0, 1]

  GPU   :  [0]

  RAM   :  [0]

  SSD   :  False

--------------------------------------------------------------

Importing job module for job type patch_ctf_estimation_multi...

Job ready to run

***************************************************************

Job will process this many micrographs:  4

parent process is 31659

Calling CUDA init from 31697

Outputting partial results now...

Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 78, in cryosparc2_compute.run.main
  File "cryosparc2_worker/cryosparc2_compute/jobs/ctf_estimation/run.py", line 256, in cryosparc2_compute.jobs.ctf_estimation.run.run
AssertionError: Child process with PID 31697 has terminated unexpectedly!

any suggestion will help!
thanks

Hi @shubo,

This is a bug in cryoSPARC v2.12.0 and 2.12.2. A patch has been released (v2.12.4) to fix this issue, as well as a few others.

Hello @stephan,

I still see this in 2.12.4:

Traceback (most recent call last):
File “cryosparc2_worker/cryosparc2_compute/run.py”, line 78, in cryosparc2_compute.run.main
File “cryosparc2_worker/cryosparc2_compute/jobs/ctf_estimation/run.py”, line 257, in cryosparc2_compute.jobs.ctf_estimation.run.run
AssertionError: Child process with PID 39181 has terminated unexpectedly!

Hi @vatese,

We’ve released this bug in the current release- meaning there isn’t a new version number for this. In order to get the fix, you’re going to have to force-reinstall cryoSPARC (which will reinstall your cryoSPARC instance after pulling the latest version from our servers).

More info here:
https://cryosparc.com/docs/reference/install/#forced-update

Hello @stephan,

When was this release?

This server was installed last Friday.

Kind regards and thanks!

Hi @vatese,

This was updated Tuesday, December 10, 2019 at 1PM EST (GMT-5).

Hello @stephan,

[cryosparc@cryosparc cryosparc2_master]$ cryosparcm update --version=v2.12.4
CryoSPARC current version v2.12.4
update starting on Wed Dec 11 09:33:22 AEDT 2019

Already up to date: current version v2.12.4 new version v2.12.4

Should I try with the --override flag?

1 Like

Hi @vatese,

Yes, thats correct, you need to specify the --override parameter.

Hi @stephan,

Will give it a go.

Thanks for the help!

Hello @stephan,

Unfortunately the problem persists after the forced reinstall.

Any other suggestions?

Kind regards.

Hi @vatese,

Can you post full system specs? (OS, CPU, RAM, GPU, CUDA, etc)

Hello @stephan

Cryosparc masterCentOS 7.6.1810
We submit cryosparc jobs to Slurm. Our nodes run Centos with CUDA 10.0, 4 Nvidia P100s. The nodes have 108 GB RAM usable and 47 CPUs. We have three of these nodes.

Got patch CTF running. Seems like it only fails when all the GPUs are selected (4 in our case). When I select three or less it runs fine.

Kind regards.

Hey @vatese,

Just to be sure, can you run the following two commands:
cryosparcm update --version=v2.12.2
cryosparcm update --version=v2.12.4

This will make sure all your files are up to date.

The when using the force update method (--override), you need to run this command for both the master and each worker.

Hi,
I just installed the latest cryosparc v2.14.2 last week. We have encountered the same error when we try to run motioncor. I wonder if there is any progress and known solution to this problem, as the cryosparc version is obviously not the issue here.
Thanks!

Hey @dotan,

Is it possible if you can attach logs from the job itself? Sometimes there may be more information in the stdout of the job. You can find this by running the command cryosparcm joblog Px Jx where Px is the project uid (e.g., “P12”) and Jx is the job uid (e.g., J234). Thanks!

Here is the logs from the job.

IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File 
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 82, in cryosparc2_compute.run.main
  File "cryosparc2_worker/cryosparc2_compute/jobs/ctf_estimation/run.py", line 258, in cryosparc2_compute.jobs.ctf_estimation.run.run
AssertionError: Child process with PID 142813 has terminated unexpectedly!
========= main process now complete.
========= monitor process now complete.

Thank you.

Hi @dotan,

Is this the entire log that shows up when you run the command? Also, I see you sent a DM that suggests that all GPU-related jobs fail. Can you try running the blob picker, and posting any tracebacks/ error messages that show up?

Hi,
No, it is not . There are many repeating messages in the middle of the log file which I did not include. Here is the entire log file.

IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
Traceback (most recent call last):
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
    send(obj)
IOError: [Errno 32] Broken pipe
========= sending heartbeat
========= sending heartbeat
========= sending heartbeat
**** handle exception rc
set status to failed
Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 82, in cryosparc2_compute.run.main
  File "cryosparc2_worker/cryosparc2_compute/jobs/motioncorrection/run_patch.py", line 349, in cryosparc2_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi
AssertionError: Child process with PID 136016 has terminated unexpectedly!
========= main process now complete.
========= monitor process now complete.

Below is the error message for the blob-picker job.

[CPU: 158.9 MB]  Traceback (most recent call last):
  File "cryosparc2_worker/cryosparc2_compute/run.py", line 69, in cryosparc2_compute.run.main
  File "cryosparc2_compute/jobs/jobregister.py", line 335, in get_run_function
    runmod = importlib.import_module(".."+modname, __name__)
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "cryosparc2_worker/cryosparc2_compute/jobs/template_picker_gpu/run.py", line 12, in init cryosparc2_compute.jobs.template_picker_gpu.run
  File "cryosparc2_compute/engine/__init__.py", line 8, in <module>
    from engine import *
  File "cryosparc2_worker/cryosparc2_compute/engine/engine.py", line 12, in init cryosparc2_compute.engine.engine
  File "cryosparc2_worker/cryosparc2_compute/engine/gfourier.py", line 6, in init cryosparc2_compute.engine.gfourier
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/fft.py", line 20, in <module>
    from . import misc
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/misc.py", line 25, in <module>
    from . import cublas
  File "/data1/software/cryosparc/cryosparc2_worker/deps/anaconda/lib/python2.7/site-packages/skcuda/cublas.py", line 56, in <module>
    raise OSError('cublas library not found')
OSError: cublas library not found

Thank you!
Dongyan

Hi @dotan,

Great, thanks for providing those error messages!
This looks like an issue thats common amongst CUDA 10.1+ installations. Can you confirm if you’re using this version of CUDA?

The current workaround is to install an earlier version of the CUDA Toolkit on your system and add the libraries to your CUDA path.

You can do this by:

  1. Download and install CUDA 9.2
  2. Once installed, edit the file cryosparc2_worker/config.sh
    Add the line export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-9.2/lib64
  3. Save the file and rerun the job to see if the error message goes away.

Thank you!
I am using CUDA 10.2 installation. I tried to install CUDA 9.2 as you suggest, but the installation keeps failing.

ERROR: An NVIDIA kernel module ‘nvidia-drm’ appears to already be loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Please be sure to exit any programs that may be using the GPU(s) before attempting to upgrade your driver. If no GPU-based programs are running, you know that your kernel supports module unloading, and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module’s usage count, for which the simplest remedy is to reboot your computer.

I killed all the GPU-based programs but it still doesn’t work. I will have to wait till later when the campus reopens, so that I can access the server room to reboot the computer.

Thanks!