Child process with PID [xxxx] terminated unexpectedly with exit code 1

Hi, I am trying to run a patch motion correction of 9156 movies imported as .tif format. I am running the cryosparc v4.4.1. My computer have 4 GPUs (NVIDIA GeForce RTX 4090). I distributed the work in all 4 GPUs but I am getting an error message saying 'Child process with PID [xxxx] terminated unexpectedly with exit code 1. I read all the previous related topics but those are not helpful for me. Here I have pasted the event log for your reference. Looking forward for a quick help.
License is valid.

Launching job on lane default target comino …

Running job on master node hostname comino

[CPU: 217.0 MB Avail: 502.84 GB]
Job J9 Started

[CPU: 217.0 MB Avail: 502.84 GB]
Master running v4.4.1, worker running v4.4.1

[CPU: 217.0 MB Avail: 502.84 GB]
Working in directory: /home/zhanglab_cwru/NYSCB/DATA/NCCAT_processed_DATA/ABC/CS-XXX/J9

[CPU: 217.0 MB Avail: 502.84 GB]
Running on lane default

[CPU: 217.0 MB Avail: 502.84 GB]
Resources allocated:

[CPU: 217.0 MB Avail: 502.84 GB]
Worker: comino

[CPU: 217.0 MB Avail: 502.84 GB]
CPU : [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]

[CPU: 217.0 MB Avail: 502.84 GB]
GPU : [0, 1, 2, 3]

[CPU: 217.0 MB Avail: 502.84 GB]
RAM : [0, 1, 2, 3, 4, 5, 6, 7]

[CPU: 217.0 MB Avail: 502.84 GB]
SSD : False

[CPU: 217.0 MB Avail: 502.84 GB]

[CPU: 217.0 MB Avail: 502.84 GB]
Importing job module for job type patch_motion_correction_multi…

[CPU: 241.1 MB Avail: 502.82 GB]
Job ready to run

[CPU: 241.1 MB Avail: 502.82 GB]


[CPU: 245.1 MB Avail: 502.82 GB]
Job will process this many movies: 9156

[CPU: 245.1 MB Avail: 502.82 GB]
Random seed: 66532266

[CPU: 245.1 MB Avail: 502.82 GB]
parent process is 896409

[CPU: 208.4 MB Avail: 502.76 GB]
Calling CUDA init from 896453

[CPU: 208.4 MB Avail: 502.75 GB]
Calling CUDA init from 896455

[CPU: 208.4 MB Avail: 502.75 GB]
Calling CUDA init from 896457

[CPU: 208.4 MB Avail: 502.75 GB]
Calling CUDA init from 896460

[CPU: 284.1 MB Avail: 502.81 GB]
Child process with PID 896453 terminated unexpectedly with exit code 1.

[CPU: 284.1 MB Avail: 502.81 GB]
Child process with PID 896455 terminated unexpectedly with exit code 1.

[CPU: 284.1 MB Avail: 502.81 GB]
Child process with PID 896457 terminated unexpectedly with exit code 1.

[CPU: 284.1 MB Avail: 502.81 GB]
Child process with PID 896460 terminated unexpectedly with exit code 1.

[CPU: 264.4 MB Avail: 502.82 GB]

[CPU: 264.4 MB Avail: 502.82 GB]
Compiling job outputs…

[CPU: 264.4 MB Avail: 502.82 GB]
Passing through outputs for output group micrographs from input group movies

[CPU: 264.4 MB Avail: 502.82 GB]
This job outputted results [‘micrograph_blob_non_dw’, ‘micrograph_thumbnail_blob_1x’, ‘micrograph_thumbnail_blob_2x’, ‘micrograph_blob’, ‘background_blob’, ‘rigid_motion’, ‘spline_motion’]

[CPU: 264.4 MB Avail: 502.82 GB]
Loaded output dset with 0 items

[CPU: 264.4 MB Avail: 502.82 GB]
Passthrough results [‘movie_blob’, ‘gain_ref_blob’, ‘mscope_params’]

[CPU: 264.4 MB Avail: 502.82 GB]
Loaded passthrough dset with 9156 items

[CPU: 264.4 MB Avail: 502.82 GB]
Intersection of output and passthrough has 0 items

[CPU: 264.4 MB Avail: 502.82 GB]
Passing through outputs for output group micrographs_incomplete from input group movies

[CPU: 264.4 MB Avail: 502.82 GB]
This job outputted results [‘micrograph_blob’]

[CPU: 264.4 MB Avail: 502.82 GB]
Loaded output dset with 9156 items

[CPU: 264.4 MB Avail: 502.82 GB]
Passthrough results [‘movie_blob’, ‘gain_ref_blob’, ‘mscope_params’]

[CPU: 264.4 MB Avail: 502.82 GB]
Loaded passthrough dset with 9156 items

[CPU: 264.4 MB Avail: 502.82 GB]
Intersection of output and passthrough has 9156 items

[CPU: 264.4 MB Avail: 502.82 GB]
Checking outputs for output group micrographs

[CPU: 264.4 MB Avail: 502.82 GB]
Checking outputs for output group micrographs_incomplete

[CPU: 264.4 MB Avail: 502.82 GB]
Updating job size…

[CPU: 264.4 MB Avail: 502.82 GB]
Exporting job and creating csg files…

[CPU: 264.4 MB Avail: 502.82 GB]


[CPU: 264.4 MB Avail: 502.82 GB]
Job complete. Total time 30.66s

@sabdulmohid Please can you post the output of these commands (run on the worker node)

nvidia-smi
cryosparcm job log PX J9

after substituting PX with the actual project UID.

I got the following replies:
**zhanglab_cwru@comino:/media/zhanglab_cwru/42d5822c-c931-421c-927b622f86610130/cryosparc/cryosparc_worker$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 545.23
zhanglab_cwru@comino:/media/zhanglab_cwru/42d5822c-c931-421c-927b-622f86610130/cryosparc/cryosparc_worker$ cryosparcm job log J9-G0 J9
Unknown cryoSPARC command job
I think I got serious issue with CUDA toolkit. Please suggest.

It is possible the nvidia driver has recently been updated on comino, and comino needs to be rebooted for the updated driver to become “valid”.

1 Like

Fantastic! It worked! Thank you so much. I just rebooted the system and the patch motion correction is running fine.