Hi,
I am testing extensive validation on cryosparc and got the error
I am attaching job log for review
License is valid.
Launching job on lane g4-singlegpu-queue target g4-singlegpu-queue ...
Launching job on cluster g4-singlegpu-queue
====================== Cluster submission script: ========================
==========================================================================
#!/usr/bin/env bash
#SBATCH --job-name cryosparc_P1_J85
#SBATCH --cpus-per-task=6
#SBATCH -N 1
#SBATCH --mem=16G
#SBATCH -o /efshome/apps/cryosparc/projects/CS-pcs-test/J85/slurm-%j.out
#SBATCH -e /efshome/apps/cryosparc/projects/CS-pcs-test/J85/slurm-%j.err
#SBATCH --exclusive --partition=g4-singlegpu
#SBATCH --gres=gpu:1
#SBATCH --constraint="g4dn-4xlarge-node"
export PATH=/opt/aws/pcs/scheduler/slurm-25.05/bin:$PATH
export LD_LIBRARY_PATH=/lib64:/usr/lib64
unset CUDA_HOME
unset CUDA_PATH
unset CUDA_ROOT
unset CUDA_VISIBLE_DEVICES
export PYTHONIOENCODING=UTF-8
export PYTHONUTF8=1
export CRYOSPARC_CACHE_DIR=/scratch
export NUMBA_CUDA_USE_NVIDIA_BINDING=0
export NUMBA_CUDA_DRIVER=/lib64/libcuda.so.1
export CUDA_DEVICE_ORDER=PCI_BUS_ID
if [ -n "$SLURM_JOB_GPUS" ]; then
export CUDA_VISIBLE_DEVICES=$SLURM_JOB_GPUS
elif [ -n "$SLURM_STEP_GPUS" ]; then
export CUDA_VISIBLE_DEVICES=$SLURM_STEP_GPUS
fi
echo "CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES"
echo "Running on host: $(hostname)"
nvidia-smi || echo "nvidia-smi failed"
ldconfig -p | grep libcuda || echo "libcuda not found"
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/bin/cryosparcw run --project P1 --job J85 --master_hostname 10.201.50.54 --master_command_core_port 39002 > /efshome/apps/cryosparc/projects/CS-pcs-test/J85/job.log 2>&1
==========================================================================
==========================================================================
-------- Submission command:
/opt/aws/pcs/scheduler/slurm-25.05/bin/sbatch /efshome/apps/cryosparc/projects/CS-pcs-test/J85/queue_sub_script.sh
-------- Cluster Job ID:
127
-------- Queued on cluster at 2026-03-06 09:44:47.677438
-------- Cluster job status at 2026-03-06 09:44:48.478793 (0 retries)
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
127 g4-single cryospar cryospar R 0:01 1 g4dn-4xlarge-node-2
[CPU: 91.3 MB]
Job J85 Started
[CPU: 91.4 MB]
Master running v4.7.1, worker running v4.7.1
[CPU: 91.6 MB]
Working in directory: /efshome/apps/cryosparc/projects/CS-pcs-test/J85
[CPU: 91.6 MB]
Running on lane g4-singlegpu-queue
[CPU: 91.6 MB]
Resources allocated:
[CPU: 91.6 MB]
Worker: g4-singlegpu-queue
[CPU: 91.6 MB]
CPU : [0, 1, 2, 3, 4, 5]
[CPU: 91.6 MB]
GPU : [0]
[CPU: 91.6 MB]
RAM : [0, 1]
[CPU: 91.6 MB]
SSD : False
[CPU: 91.6 MB]
--------------------------------------------------------------
[CPU: 91.6 MB]
Importing job module for job type patch_motion_correction_multi...
[CPU: 251.3 MB]
Job ready to run
[CPU: 251.3 MB]
***************************************************************
[CPU: 251.3 MB]
Transparent hugepages are enabled. You may encounter stalls or performance problems with CryoSPARC jobs.
[CPU: 251.9 MB]
Job will process this many movies: 20
[CPU: 251.9 MB]
Job will output denoiser training data for this many movies: 20
[CPU: 251.9 MB]
Random seed: 1759570088
[CPU: 251.9 MB]
parent process is 17659
[CPU: 164.2 MB]
Calling CUDA init from 17685
[CPU: 309.1 MB]
-- 0.0: processing 1 of 20: J84/imported/005485533045405691539_14sep05c_00024sq_00003hl_00002es.frames.tif
loading /efshome/apps/cryosparc/projects/CS-pcs-test/J84/imported/005485533045405691539_14sep05c_00024sq_00003hl_00002es.frames.tif
Loading raw movie data from J84/imported/005485533045405691539_14sep05c_00024sq_00003hl_00002es.frames.tif ...
Done in 1.72s
Loading gain data from J84/imported/norm-amibox05-0.mrc ...
Done in 0.07s
Processing ...
[CPU: 89.9 MB]
WARNING: io_uring support disabled (not supported by kernel), I/O performance may degrade
[CPU: 252.0 MB]
Child process with PID 17685 terminated unexpectedly with exit code 1.
[CPU: 252.0 MB]
['uid', 'movie_blob/path', 'movie_blob/shape', 'movie_blob/psize_A', 'movie_blob/is_gain_corrected', 'movie_blob/format', 'movie_blob/has_defect_file', 'movie_blob/import_sig', 'micrograph_blob/path', 'micrograph_blob/idx', 'micrograph_blob/shape', 'micrograph_blob/psize_A', 'micrograph_blob/format', 'micrograph_blob/is_background_subtracted', 'micrograph_blob/vmin', 'micrograph_blob/vmax', 'micrograph_blob/import_sig', 'micrograph_blob_non_dw/path', 'micrograph_blob_non_dw/idx', 'micrograph_blob_non_dw/shape', 'micrograph_blob_non_dw/psize_A', 'micrograph_blob_non_dw/format', 'micrograph_blob_non_dw/is_background_subtracted', 'micrograph_blob_non_dw/vmin', 'micrograph_blob_non_dw/vmax', 'micrograph_blob_non_dw/import_sig', 'micrograph_blob_non_dw_AB/path', 'micrograph_blob_non_dw_AB/idx', 'micrograph_blob_non_dw_AB/shape', 'micrograph_blob_non_dw_AB/psize_A', 'micrograph_blob_non_dw_AB/format', 'micrograph_blob_non_dw_AB/is_background_subtracted', 'micrograph_blob_non_dw_AB/vmin', 'micrograph_blob_non_dw_AB/vmax', 'micrograph_blob_non_dw_AB/import_sig', 'micrograph_thumbnail_blob_1x/path', 'micrograph_thumbnail_blob_1x/idx', 'micrograph_thumbnail_blob_1x/shape', 'micrograph_thumbnail_blob_1x/format', 'micrograph_thumbnail_blob_1x/binfactor', 'micrograph_thumbnail_blob_1x/micrograph_path', 'micrograph_thumbnail_blob_1x/vmin', 'micrograph_thumbnail_blob_1x/vmax', 'micrograph_thumbnail_blob_2x/path', 'micrograph_thumbnail_blob_2x/idx', 'micrograph_thumbnail_blob_2x/shape', 'micrograph_thumbnail_blob_2x/format', 'micrograph_thumbnail_blob_2x/binfactor', 'micrograph_thumbnail_blob_2x/micrograph_path', 'micrograph_thumbnail_blob_2x/vmin', 'micrograph_thumbnail_blob_2x/vmax', 'background_blob/path', 'background_blob/idx', 'background_blob/binfactor', 'background_blob/shape', 'background_blob/psize_A', 'rigid_motion/type', 'rigid_motion/path', 'rigid_motion/idx', 'rigid_motion/frame_start', 'rigid_motion/frame_end', 'rigid_motion/zero_shift_frame', 'rigid_motion/psize_A', 'spline_motion/type', 'spline_motion/path', 'spline_motion/idx', 'spline_motion/frame_start', 'spline_motion/frame_end', 'spline_motion/zero_shift_frame', 'spline_motion/psize_A']
[CPU: 252.2 MB]
--------------------------------------------------------------
[CPU: 252.2 MB]
Compiling job outputs...
[CPU: 252.2 MB]
Passing through outputs for output group micrographs from input group movies
[CPU: 252.2 MB]
This job outputted results ['micrograph_blob_non_dw', 'micrograph_blob_non_dw_AB', 'micrograph_thumbnail_blob_1x', 'micrograph_thumbnail_blob_2x', 'movie_blob', 'micrograph_blob', 'background_blob', 'rigid_motion', 'spline_motion']
[CPU: 252.2 MB]
Loaded output dset with 0 items
[CPU: 252.2 MB]
Passthrough results ['gain_ref_blob', 'mscope_params']
[CPU: 252.2 MB]
Loaded passthrough dset with 20 items
[CPU: 252.2 MB]
Intersection of output and passthrough has 0 items
[CPU: 252.2 MB]
Output dataset contains: ['mscope_params', 'gain_ref_blob']
[CPU: 252.2 MB]
Outputting passthrough result gain_ref_blob
[CPU: 252.2 MB]
Outputting passthrough result mscope_params
[CPU: 252.2 MB]
Passing through outputs for output group micrographs_incomplete from input group movies
[CPU: 252.2 MB]
This job outputted results ['micrograph_blob']
[CPU: 252.2 MB]
Loaded output dset with 20 items
[CPU: 252.2 MB]
Passthrough results ['movie_blob', 'gain_ref_blob', 'mscope_params']
[CPU: 252.2 MB]
Loaded passthrough dset with 20 items
[CPU: 252.3 MB]
Intersection of output and passthrough has 20 items
[CPU: 252.3 MB]
Output dataset contains: ['movie_blob', 'mscope_params', 'gain_ref_blob']
[CPU: 252.3 MB]
Outputting passthrough result movie_blob
[CPU: 252.3 MB]
Outputting passthrough result gain_ref_blob
[CPU: 252.3 MB]
Outputting passthrough result mscope_params
[CPU: 252.3 MB]
Checking outputs for output group micrographs
[CPU: 252.3 MB]
Checking outputs for output group micrographs_incomplete
[CPU: 252.5 MB]
Updating job size...
[CPU: 252.5 MB]
Exporting job and creating csg files...
[CPU: 252.5 MB]
***************************************************************
[CPU: 252.5 MB]
Job complete. Total time 30.45s
JOB LOG
================= CRYOSPARCW ======= 2026-03-06 09:44:52.046289 =========
Project P1 Job J85
Master 10.201.50.54 Port 39002
MAIN PROCESS PID 17659
========= now starting main process at 2026-03-06 09:44:52.046804
motioncorrection.run_patch cryosparc_compute.jobs.jobregister
MONITOR PROCESS PID 17661
========= monitor process now waiting for main process
========= sending heartbeat at 2026-03-06 09:44:57.596045
Transparent hugepages setting: [always] madvise never
Running job on hostname %s g4-singlegpu-queue
Allocated Resources : {‘fixed’: {‘SSD’: False}, ‘hostname’: ‘g4-singlegpu-queue’, ‘lane’: ‘g4-singlegpu-queue’, ‘lane_type’: ‘cluster’, ‘license’: True, ‘licenses_acquired’: 1, ‘slots’: {‘CPU’: [0, 1, 2, 3, 4, 5], ‘GPU’: [0], ‘RAM’: [0, 1]}, ‘target’: {‘cache_path’: ‘/scratch’, ‘cache_quota_mb’: 80000, ‘cache_reserve_mb’: 10000, ‘custom_var_names’:
, ‘custom_vars’: {}, ‘desc’: None, ‘hostname’: ‘g4-singlegpu-queue’, ‘lane’: ‘g4-singlegpu-queue’, ‘name’: ‘g4-singlegpu-queue’, ‘qdel_cmd_tpl’: ‘/opt/aws/pcs/scheduler/slurm-25.05/bin/scancel {{ cluster_job_id }}’, ‘qinfo_cmd_tpl’: ‘/opt/aws/pcs/scheduler/slurm-25.05/bin/sinfo’, ‘qstat_cmd_tpl’: ‘/opt/aws/pcs/scheduler/slurm-25.05/bin/squeue -j {{ cluster_job_id }}’, ‘qstat_code_cmd_tpl’: None, ‘qsub_cmd_tpl’: ‘/opt/aws/pcs/scheduler/slurm-25.05/bin/sbatch {{ script_path_abs }}’,‘script_tpl’: ‘#!/usr/bin/env bash\n#SBATCH --job-name cryosparc_{{ project_uid }}_{{ job_uid }}\n#SBATCH --cpus-per-task={{ num_cpu }}\n#SBATCH -N 1\n#SBATCH --mem={{ (ram_gb)|int }}G\n#SBATCH -o {{ job_dir_abs }}/slurm-%j.out\n#SBATCH -e {{ job_dir_abs }}/slurm-%j.err\n#SBATCH --exclusive --partition=g4-singlegpu\n#SBATCH --gres=gpu:1\n#SBATCH --constraint=“g4dn-4xlarge-node”\n\nexport PATH=/opt/aws/pcs/scheduler/slurm-25.05/bin:$PATH\nexport LD_LIBRARY_PATH=/lib64:/usr/lib64\nunset CUDA_HOME\nunset CUDA_PATH\nunset CUDA_ROOT\nunset CUDA_VISIBLE_DEVICES\nexport PYTHONIOENCODING=UTF-8\nexport PYTHONUTF8=1\nexport CRYOSPARC_CACHE_DIR=/scratch\nexport NUMBA_CUDA_USE_NVIDIA_BINDING=0\nexport NUMBA_CUDA_DRIVER=/lib64/libcuda.so.1\nexport CUDA_DEVICE_ORDER=PCI_BUS_ID\n\nif [ -n “$SLURM_JOB_GPUS” ]; then\n export CUDA_VISIBLE_DEVICES=$SLURM_JOB_GPUS\nelif [ -n “$SLURM_STEP_GPUS” ]; then\n export CUDA_VISIBLE_DEVICES=$SLURM_STEP_GPUS\nfi\n\necho “CUDA_VISIBLE_DEVICES=(hostname)”\nnvidia-smi || echo “nvidia-smi failed”\nldconfig -p | grep libcuda || echo “libcuda not found”\n\n{{ run_cmd }}\n’, ‘send_cmd_tpl’: ‘{{ command }}’, ‘title’: ‘g4-singlegpu-queue’, ‘tpl_vars’: [‘job_uid’, ‘run_cmd’, ‘command’, ‘ram_gb’, ‘job_dir_abs’, ‘cluster_job_id’, ‘project_uid’, ‘num_cpu’], ‘type’: ‘cluster’, ‘worker_bin_path’: ‘/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/bin/cryosparcw’}}
WARNING: io_uring support disabled (not supported by kernel), I/O performance may degrade
Received SIGSEGV (addr=0000000000000000)
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/cryosparc_compute/ioengine/core.so(traceback_signal_handler+0x113)[0x7f9fc3cb2a03]
/lib64/libc.so.6(+0x3fc30)[0x7f9fcc63fc30]
/usr/lib64/libcuda.so.1(+0x31a718)[0x7f9f5a31a718]
/usr/lib64/libcuda.so.1(cuCtxGetDevice_v2+0x20)[0x7f9f5a30dae0]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/../../libffi.so.8(+0x6a4a)[0x7f9fcbd32a4a]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/../../libffi.so.8(+0x5fea)[0x7f9fcbd31fea]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x12461)[0x7f9fcbb06461]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/lib-dynload/_ctypes.cpython-310-x86_64-linux-gnu.so(+0x866e)[0x7f9fcbafc66e]
python(PyObject_Call+0x207)[0x55e65113d067]
python(_PyEval_EvalFrameDefault+0x2d83)[0x55e6511232b3]
python(_PyFunction_Vectorcall+0x6c)[0x55e651130a2c]
python(_PyEval_EvalFrameDefault+0x4c12)[0x55e651125142]
python(+0x150804)[0x55e65113c804]
python(_PyEval_EvalFrameDefault+0x28ea)[0x55e651122e1a]
python(+0x150582)[0x55e65113c582]
python(_PyEval_EvalFrameDefault+0x4c12)[0x55e651125142]
python(_PyFunction_Vectorcall+0x6c)[0x55e651130a2c]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/cryosparc_compute/gpu/gpucore.cpython-310-x86_64-linux-gnu.so(+0x1c33c)[0x7f9fa40c233c]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/cryosparc_compute/gpu/gpucore.cpython-310-x86_64-linux-gnu.so(+0x4048c)[0x7f9fa40e648c]
python(+0x15091e)[0x55e65113c91e]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/cryosparc_compute/jobs/motioncorrection/patchmotion.cpython-310-x86_64-linux-gnu.so(+0x498c0)[0x7f9f9829f8c0]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/cryosparc_compute/run.cpython-310-x86_64-linux-gnu.so(+0xd224)[0x7f9fcc845224]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.cpython-310-x86_64-linux-gnu.so(+0x11e9e)[0x7f9fc2232e9e]
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/cryosparc_compute/jobs/motioncorrection/run_patch.cpython-310-x86_64-linux-gnu.so(+0x5d1ce)[0x7f9fc227e1ce]
python(_PyEval_EvalFrameDefault+0x72c)[0x55e651120c5c]
python(_PyFunction_Vectorcall+0x6c)[0x55e651130a2c]
python(_PyEval_EvalFrameDefault+0x72c)[0x55e651120c5c]
python(_PyFunction_Vectorcall+0x6c)[0x55e651130a2c]
python(_PyEval_EvalFrameDefault+0x2d83)[0x55e6511232b3]
python(+0x150804)[0x55e65113c804]
python(_PyEval_EvalFrameDefault+0x2d83)[0x55e6511232b3]
python(_PyFunction_Vectorcall+0x6c)[0x55e651130a2c]
python(_PyEval_EvalFrameDefault+0x4c12)[0x55e651125142]
python(_PyFunction_Vectorcall+0x6c)[0x55e651130a2c]
python(_PyEval_EvalFrameDefault+0x72c)[0x55e651120c5c]
python(+0x150804)[0x55e65113c804]
python(+0x228372)[0x55e651214372]
python(+0x228324)[0x55e651214324]
/lib64/libc.so.6(+0x8b2ea)[0x7f9fcc68b2ea]
/lib64/libc.so.6(+0x1103d0)[0x7f9fcc7103d0]
rax 0000000000000000 rbx 00007f9f8e7ad810 rcx 0000000000000001 rdx 00007f9fcc7fcb00
rsi 000055e654e17090 rdi 0000000000000140 rbp 00007f9fa6fface0 rsp 00007f9fa6ffac10
r8 00007f9f9420fbb0 r9 000055e654b8c350 r10 0000000000000000 r11 00007f9f5a30dac0
r12 000055e654e17090 r13 0000000000000000 r14 00007f9fa6ffadc0 r15 00007f9fa6ffae28
c0 75 d0 48 8b bd 60 ff ff ff 48 8d b5 70 ff ff ff ba 08 00 00 00 e8 1d 28 f8 ff 85 c0
75 b4 48 8b 85 70 ff ff ff 48 8b 40 10 eb 14 0f 1f 40 00 41 83 3c 24 01 0f 84 85 01 00
00 49 8b 44 24 10
→ 8b 00 89 03 48 81 c4 b8 00 00 00 31 c0 5b 41 5c 41 5d 5d c3 0f 1f 40 00 48 8b bd
40 ff ff ff 48 8d b5 38 ff ff ff e8 7d f8 d9 00 85 c0 0f 85 49 ff ff ff 48 8b 85 38 ff
ff ff 66 48 0f 6e c3 31
========= sending heartbeat at 2026-03-06 09:45:07.610751
========= sending heartbeat at 2026-03-06 09:45:17.626383
========= sending heartbeat at 2026-03-06 09:45:27.641165
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3474: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
========= sending heartbeat at 2026-03-06 09:45:37.656494
========= heartbeat failed at 2026-03-06 09:45:37.661973:
========= sending heartbeat at 2026-03-06 09:45:47.672036
========= heartbeat failed at 2026-03-06 09:45:47.677318:
========= sending heartbeat at 2026-03-06 09:45:57.687382
========= heartbeat failed at 2026-03-06 09:45:57.692712:
************* Connection to cryosparc command lost. Heartbeat failed 3 consecutive times at 2026-03-06 09:45:57.692761.
/efshome/apps/cryosparc/apps/v4.7.1_251124/cryosparc_worker/bin/cryosparcw: line 151: 17659 Killed python -c “import cryosparc_compute.run as run; run.run()” “$@”
using below versions
cuda - 12.4
