@wtempel The job hangs at 0% of the progress bar, with the GPU’s RAM loaded but no GPU usage or system CPU/RAM usage, for about 20-30 minutes before giving this error output:
drichman@hawk:/data/liuchuan/cryosparc_projects/CS-bill-dnab/J172/test1_DER_2023-06-08$ /programs/x86_64-linux/deepemhancer/20220530_cu10/bin/deepemhancer -i /data/liuchuan/cryosparc_projects/CS-bill-dnab/J163/J163_005_volume_map_half_A.mrc -i2 /data/liuchuan/cryosparc_projects/CS-bill-dnab/J163/J163_005_volume_map_half_B.mrc -o /data/liuchuan/cryosparc_projects/CS-bill-dnab/J172/J172_map_sharp.mrc -g 0 --deepLearningModelPath /home/exx/.local/share/deepEMhancerModels/production_checkpoints -p tightTarget
updating environment to select gpu: [0]
Using TensorFlow backend.
loading model /home/exx/.local/share/deepEMhancerModels/production_checkpoints/deepEMhancer_tightTarget.hd5 … DONE!
Automatic radial noise detected beyond 86.60254037844386 % of volume side
DONE!. Shape at 1 A/voxel after padding-> (352, 352, 352)
Neural net inference
0%| | 0/361 [00:00<?, ?it/s]2023-06-08 12:19:51.912814: E tensorflow/stream_executor/cuda/cuda_blas.cc:428] failed to run cuBLAS routine: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/bin/deepemhancer”, line 11, in
sys.exit(commanLineFun())
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/miniconda3/envs/deepEMhancer_env/lib/python3.6/site-packages/deepEMhancer/exeDeepEMhancer.py”, line 80, in commanLineFun
main( ** parseArgs() )
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/miniconda3/envs/deepEMhancer_env/lib/python3.6/site-packages/deepEMhancer/exeDeepEMhancer.py”, line 73, in main
voxel_size=boxSize, apply_postprocess_cleaning=cleaningStrengh)
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/miniconda3/envs/deepEMhancer_env/lib/python3.6/site-packages/deepEMhancer/applyProcessVol/processVol.py”, line 186, in predict
batch_y_pred= self.model.predict_on_batch(np.expand_dims(batch_x, axis=-1))
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/miniconda3/envs/deepEMhancer_env/lib/python3.6/site-packages/keras/engine/training.py”, line 1274, in predict_on_batch
outputs = self.predict_function(ins)
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/miniconda3/envs/deepEMhancer_env/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py”, line 2715, in call
return self._call(inputs)
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/miniconda3/envs/deepEMhancer_env/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py”, line 2675, in _call
fetched = self._callable_fn(*array_vals)
File “/programs/x86_64-linux/deepemhancer/20220530_cu10/miniconda3/envs/deepEMhancer_env/lib/python3.6/site-packages/tensorflow/python/client/session.py”, line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
(0) Internal: Blas SGEMM launch failed : m=2097152, n=1, k=8
[[{{node conv3d_21/convolution}}]]
(1) Internal: Blas SGEMM launch failed : m=2097152, n=1, k=8
[[{{node conv3d_21/convolution}}]]
[[activation_10/Identity/_609]]
0 successful operations.
0 derived errors ignored.
0%| | 0/361 [15:35<?, ?it/s]