Issue in Deep Picker Train

We are having the Deep Picker Train job type fail, without much useful error outputs in log files.

The conclusion is “====== Job process terminated abnormally.”

What I can see leading up to this is the outputs:

[CPU: 656.0 MB Avail: 229.23 GB]
Splitting micrographs…
[CPU: 656.0 MB Avail: 229.23 GB]
Splitting micrographs done in 1.058 seconds.
[CPU: 656.0 MB Avail: 229.23 GB]
Augmenting data…
[CPU: 656.0 MB Avail: 229.23 GB]
50869/179100 micrographs augmented.

We are now using 1 parallel thread, 1 GPU, and particle diameter of 150. Desired pixels per angstrom is 4.

Do you have any suggestions for how we can try to get this training job to work?

Please can you post the final few lines of the job log, which you can find under Metadata|Log.

@wtempel ,

It appears to start with:

===========================================================================
========= monitor process now starting main process at 2024-04-18 19:01:55.637027
MAINPROCESS PID 15149
========= monitor process now waiting for main process
MAIN PID 15149
deep_picker.run_deep_picker cryosparc_compute.jobs.jobregister
========= sending heartbeat at 2024-04-18 19:02:09.759389
========= sending heartbeat at 2024-04-18 19:02:19.776843

========= sending heartbeat at 2024-04-18 20:37:11.690971
========= sending heartbeat at 2024-04-18 20:37:21.795241
========= sending heartbeat at 2024-04-18 20:37:31.891942
========= main process now complete at 2024-04-18 20:37:37.273374.
========= monitor process now complete at 2024-04-18 20:37:37.347531.


So, there are just a lot of ‘heartbeat’ lines recorded and not much else.

Within the job folder itself of J124, there are only an events.bson, an empty gridfs_data folder, a job.json and a job.log file (matching above).

I do not seem to be able to find any error output or lines of a failing python script; best I can see is that its ending during the micrographs augmented apparently.

Thanks @larsonmattr for providing the details. Unfortunately, we do not know a solution. You may want to try training with fewer exposures. How many exposures were in your Micrographs input?