Deep picker train - terminated abnormally

Dear colleagues,

When running the Deep picker train process finished at the following stage:

[CPU: 52.75 GB Avail: 62.71 GB]
Preparing training…

[CPU: 52.75 GB Avail: 62.69 GB]
Calculating class weights…

[CPU: 52.75 GB Avail: 62.71 GB]
Positive weight: 4.489346380260383

[CPU: 52.75 GB Avail: 62.71 GB]
Negative weight: 0.562666907350292

[CPU: 11.0 MB Avail: 120.90 GB]
====== Job process terminated abnormally.

Any ideas on how to fix that and what has gone wrong?

Thank you in advance.

Kind regards,
Dmitry

Can you find additional information in the job log (Metadata|Log)?

1 Like

hello @wtempel,

Please find the file here
log

Kind regards,
Dmitry

This link had expired when I tried downloading the log. Please can you post relevant excerpts of the log under this forum topic.

1 Like

Dear @wtempel ,

I have marked that Deep picker train project as accomplished and started the new run of Deep picker train run using the model from the previous (unsuccessful one).
And here is more extended error.

kind regards,
Dmitry

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/deep_picker/run_deep_picker.py”, line 275, in cryosparc_master.cryosparc_compute.jobs.deep_picker.run_deep_picker.run_deep_picker_train
File “cryosparc_master/cryosparc_compute/jobs/deep_picker/train.py”, line 56, in cryosparc_master.cryosparc_compute.jobs.deep_picker.train.train_picker
File “cryosparc_master/cryosparc_compute/jobs/deep_picker/train.py”, line 121, in cryosparc_master.cryosparc_compute.jobs.deep_picker.train.train_picker
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py”, line 793, in from_tensor_slices
return TensorSliceDataset(tensors, name=name)
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py”, line 4477, in init
element = structure.normalize_element(element)
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/data/util/structure.py”, line 125, in normalize_element
ops.convert_to_tensor(t, name=“component_%d” % i, dtype=dtype))
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/profiler/trace.py”, line 183, in wrapped
return func(*args, **kwargs)
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/framework/ops.py”, line 1695, in convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/framework/tensor_conversion_registry.py”, line 48, in _default_conversion_function
return constant_op.constant(value, dtype, name=name)
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py”, line 267, in constant
return _constant_impl(value, dtype, shape, name, verify_shape=False,
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py”, line 279, in _constant_impl
return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py”, line 304, in _constant_eager_impl
t = convert_to_eager_tensor(value, ctx, dtype)
File “/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/tensorflow/python/framework/constant_op.py”, line 102, in convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

To help us tell whether the latest failure is due to unsuitable inputs or due to problems with the software configuration, please post

  1. the CryoSPARC version and patch
  2. worker info
  3. job uids for the jobs you mention
  4. all relevant snippets from both the Event Log and the job logs (Metadata|Log) of both problematic jobs. Tag each log snippet with their origin: job uid and log type (event log or job log)
  5. a file listing for the first job you mentioned:
    find /path/to/first/train/jobdir/ -ls
    

hello @wtempel ,

here is the info you requested for:

  1. CS – 4.3.1
(base) [cmromao@titanios bin]$ ./cryosparcw env
export "CRYOSPARC_USE_GPU=true"
export "CRYOSPARC_CONDA_ENV=cryosparc_worker_env"
export "CRYOSPARC_DEVELOP=false"
export "CRYOSPARC_LICENSE_ID=df060e8c-7f77-11eb-9abc-132256ea4cfb"
export "CRYOSPARC_ROOT_DIR=/data2/cmromao/cryoSPARC/cryosparc_worker"
export "CRYOSPARC_PATH=/data2/cmromao/cryoSPARC/cryosparc_worker/bin"
export "CRYOSPARC_CUDA_PATH=/usr/local/cuda-11.8"
export "PATH=/data2/cmromao/cryoSPARC/cryosparc_worker/bin:/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/condabin:/usr/local/EMAN2.91/bin:/usr/local/EMAN2.91/condabin:/data2/cmromao/cryoSPARC/cryosparc_master/bin:/usr/local/relion-3/bin:/usr/local/mpich-3.2.1/bin:/usr/local/cuda/bin:/usr/local/IMOD/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/motioncorr_v2.1/bin:/usr/local/Gctf_v1.06/bin:/usr/local/Gctf_v0.50/bin:/usr/local/ResMap:/usr/local/summovie_1.0.2/bin:/usr/local/unblur_1.0.2/bin:/usr/local/EMAN2.91/bin:/data2/cmromao/.local/bin:/data2/cmromao/bin"
export "LD_LIBRARY_PATH=/usr/local/relion-3/lib:/usr/local/cuda/lib:/usr/local/cuda/lib64:/usr/local/cuda-10.1/lib:/usr/local/cuda-10.1/lib64:/usr/local/cuda-9.2/lib:/usr/local/cuda-9.2/lib64:/usr/local/cuda-9.1/lib:/usr/local/cuda-9.1/lib64:/usr/local/cuda-8.0/lib:/usr/local/cuda-8.0/lib64:/usr/local/cuda-7.5/lib:/usr/local/cuda-7.5/lib64:/usr/local/IMOD/lib:"
export "LD_PRELOAD=/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/libpython3.8.so"
export "PYTHONPATH=/data2/cmromao/cryoSPARC/cryosparc_worker"
export "PYTHONNOUSERSITE=true"
export "CONDA_SHLVL=1"
export "CONDA_PROMPT_MODIFIER=(cryosparc_worker_env)"
export "CONDA_EXE=/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/bin/conda"
export "CONDA_PREFIX=/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env"
export "CONDA_PYTHON_EXE=/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/bin/python"
export "CONDA_DEFAULT_ENV=cryosparc_worker_env"
export "NUMBA_CUDA_INCLUDE_PATH=/data2/cmromao/cryoSPARC/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include"
export "NUMBA_CUDA_USE_NVIDIA_BINDING=1"
(base) [cmromao@titanios bin]$
  1. J62
  2. See files:
    [files_transfer - Google Drive]
(base) [cmromao@titanios J62]$ find /data2/cmromao/cryosparc/complex_yloc_INL_July2023/CS-complex-yloc-inl-july2023/J62 -ls
15219505677    0 drwxrwxr-x   3 cmromao  cmromao       137 Oct  6 13:04 /data2/cmromao/cryosparc/complex_yloc_INL_July2023/CS-complex-yloc-inl-july2023/J62
3600487046    0 drwxrwxr-x   2 cmromao  cmromao        10 Oct  2 09:33 /data2/cmromao/cryosparc/complex_yloc_INL_July2023/CS-complex-yloc-inl-july2023/J62/gridfs_data
15219505666   12 -rw-rw-r--   1 cmromao  cmromao     11479 Oct  6 13:04 /data2/cmromao/cryosparc/complex_yloc_INL_July2023/CS-complex-yloc-inl-july2023/J62/events.bson
15219505678   48 -rw-rw-r--   1 cmromao  cmromao     45291 Oct  6 13:04 /data2/cmromao/cryosparc/complex_yloc_INL_July2023/CS-complex-yloc-inl-july2023/J62/job.json
15219505679   36 -rw-rw-r--   1 cmromao  cmromao     35839 Oct  2 10:50 /data2/cmromao/cryosparc/complex_yloc_INL_July2023/CS-complex-yloc-inl-july2023/J62/job.log
16899947265    4 -rw-rw-r--   1 cmromao  cmromao      3328 Oct  6 13:04 /data2/cmromao/cryosparc/complex_yloc_INL_July2023/CS-complex-yloc-inl-july2023/J62/J62_passthrough_micrographs.cs
(base) [cmromao@titanios J62]$

Thank you!

Regards,
Dmitry

Thanks for posting this information.

I do not see the model file you mentioned earlier.

Unfortunately, I cannot tell yet what the problem may be. Please can you collect job reports for the two jobs, upload them to a shared drive and send me a link via private forum message.