Two or more jobs requiring gpu cant run at the same time after updating cuda driver and cryosparc

CryoSPARC instance information
master-worker

cryosparcm status
----------------------------------------------------------------------------
CryoSPARC System master node installed at
/home/sandip/softwares/cryosparc/cryosparc_master
Current cryoSPARC version: v4.3.0
----------------------------------------------------------------------------

CryoSPARC process status:

app                              RUNNING   pid 167946, uptime 20:38:06
app_api                          RUNNING   pid 167974, uptime 20:38:05
app_api_dev                      STOPPED   Not started
app_legacy                       STOPPED   Not started
app_legacy_dev                   STOPPED   Not started
command_core                     RUNNING   pid 167782, uptime 20:38:19
command_rtp                      RUNNING   pid 167880, uptime 20:38:11
command_vis                      RUNNING   pid 167847, uptime 20:38:12
database                         RUNNING   pid 167664, uptime 20:38:22

----------------------------------------------------------------------------
License is valid
----------------------------------------------------------------------------

global config variables:
export CRYOSPARC_LICENSE_ID=""
export CRYOSPARC_MASTER_HOSTNAME="basak-cryoem1.sbs.ntu.edu.sg"
export CRYOSPARC_DB_PATH="/home/sandip/softwares/cryosparc/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DB_CONNECTION_TIMEOUT_MS=20000
export CRYOSPARC_INSECURE=false
export CRYOSPARC_DB_ENABLE_AUTH=true
export CRYOSPARC_CLUSTER_JOB_MONITOR_INTERVAL=10
export CRYOSPARC_CLUSTER_JOB_MONITOR_MAX_RETRIES=1000000
export CRYOSPARC_PROJECT_DIR_PREFIX='CS-'
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_CLICK_WRAP=true
uname -a && free -g
Linux cryoem1 5.14.0-344.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 09:26:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
               total        used        free      shared  buff/cache   available
Mem:             503          13          23           0         469         489
Swap:            127           3         124

CryoSPARC worker environment

env | grep PATH
CRYOSPARC_PATH=/home/sandip/softwares/cryosparc/cryosparc_worker/bin
MANPATH=/usr/share/man::/opt/pbs1/share/man:/opt/pbs1/share/man
__MODULES_SHARE_MANPATH=:1
PYTHONPATH=/home/sandip/softwares/cryosparc/cryosparc_worker
CRYOSPARC_CUDA_PATH=/opt/cuda-11.8
MODULES_RUN_QUARANTINE=LD_LIBRARY_PATH LD_PRELOAD
LD_LIBRARY_PATH=/opt/cuda-11.8/lib64:/home/sandip/softwares/cryosparc/cryosparc_worker/deps/external/cudnn/lib:/usr/local/cuda-12.2/lib64:/usr/lib64/
PATH=/opt/cuda-11.8/bin:/home/sandip/softwares/cryosparc/cryosparc_worker/bin:/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/condabin:/home/sandip/softwares/cryosparc/cryosparc_master/bin:/usr/local/cuda-12.2/bin:/home/sandip/.local/bin:/home/sandip/bin:/usr/share/Modules/bin:/usr/condabin:/usr/local/IMOD/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/IMOD/pythonLink:/opt/pbs1/bin:/opt/pbs1/bin
MODULEPATH=/etc/scl/modulefiles:/usr/share/Modules/modulefiles:/etc/modulefiles:/usr/share/modulefiles

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

which nvcc
/opt/cuda-11.8/bin/nvcc

python -c "import pycuda.driver; print(pycuda.driver.get_version())"
(11, 8, 0)

/sbin/ldconfig -p | grep -i cuda
        libpcsamplingutil.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libpcsamplingutil.so
        libnvrtc.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so.12
        libnvrtc.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so.12
        libnvrtc.so.11.2 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvrtc.so.11.2
        libnvrtc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc.so
        libnvrtc.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvrtc.so
        libnvrtc.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so
        libnvrtc-builtins.so.12.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so.12.2
        libnvrtc-builtins.so.12.2 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins.so.12.2
        libnvrtc-builtins.so.11.8 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvrtc-builtins.so.11.8
        libnvrtc-builtins.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvrtc-builtins.so
        libnvrtc-builtins.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvrtc-builtins.so
        libnvrtc-builtins.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvrtc-builtins.so
        libnvperf_target.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_target.so
        libnvperf_host.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvperf_host.so
        libnvjpeg.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so.12
        libnvjpeg.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvjpeg.so.12
        libnvjpeg.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvjpeg.so.11
        libnvjpeg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvjpeg.so
        libnvjpeg.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvjpeg.so
        libnvjpeg.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvjpeg.so
        libnvblas.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so.12
        libnvblas.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvblas.so.12
        libnvblas.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvblas.so.11
        libnvblas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvblas.so
        libnvblas.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvblas.so
        libnvblas.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvblas.so
        libnvToolsExt.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so.1
        libnvToolsExt.so.1 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvToolsExt.so.1
        libnvToolsExt.so.1 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvToolsExt.so.1
        libnvToolsExt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvToolsExt.so
        libnvToolsExt.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnvToolsExt.so
        libnvToolsExt.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvToolsExt.so
        libnvJitLink.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvJitLink.so.12
        libnvJitLink.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvJitLink.so.12
        libnvJitLink.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnvJitLink.so
        libnvJitLink.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnvJitLink.so
        libnpps.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so.12
        libnpps.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnpps.so.12
        libnpps.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnpps.so.11
        libnpps.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnpps.so
        libnpps.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnpps.so
        libnpps.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnpps.so
        libnppitc.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so.12
        libnppitc.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppitc.so.12
        libnppitc.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppitc.so.11
        libnppitc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppitc.so
        libnppitc.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppitc.so
        libnppitc.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppitc.so
        libnppisu.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so.12
        libnppisu.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppisu.so.12
        libnppisu.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppisu.so.11
        libnppisu.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppisu.so
        libnppisu.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppisu.so
        libnppisu.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppisu.so
        libnppist.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so.12
        libnppist.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppist.so.12
        libnppist.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppist.so.11
        libnppist.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppist.so
        libnppist.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppist.so
        libnppist.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppist.so
        libnppim.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so.12
        libnppim.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppim.so.12
        libnppim.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppim.so.11
        libnppim.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppim.so
        libnppim.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppim.so
        libnppim.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppim.so
        libnppig.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so.12
        libnppig.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppig.so.12
        libnppig.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppig.so.11
        libnppig.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppig.so
        libnppig.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppig.so
        libnppig.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppig.so
        libnppif.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so.12
        libnppif.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppif.so.12
        libnppif.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppif.so.11
        libnppif.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppif.so
        libnppif.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppif.so
        libnppif.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppif.so
        libnppidei.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so.12
        libnppidei.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppidei.so.12
        libnppidei.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppidei.so.11
        libnppidei.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppidei.so
        libnppidei.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppidei.so
        libnppidei.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppidei.so
        libnppicc.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so.12
        libnppicc.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppicc.so.12
        libnppicc.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppicc.so.11
        libnppicc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppicc.so
        libnppicc.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppicc.so
        libnppicc.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppicc.so
        libnppial.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so.12
        libnppial.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppial.so.12
        libnppial.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppial.so.11
        libnppial.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppial.so
        libnppial.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppial.so
        libnppial.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppial.so
        libnppc.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so.12
        libnppc.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppc.so.12
        libnppc.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppc.so.11
        libnppc.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libnppc.so
        libnppc.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libnppc.so
        libnppc.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libnppc.so
        libicudata.so.67 (libc6,x86-64) => /lib64/libicudata.so.67
        libicudata.so (libc6,x86-64) => /lib64/libicudata.so
        libgstcuda-1.0.so.0 (libc6,x86-64) => /lib64/libgstcuda-1.0.so.0
        libcusparse.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so.12
        libcusparse.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcusparse.so.12
        libcusparse.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcusparse.so.11
        libcusparse.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusparse.so
        libcusparse.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcusparse.so
        libcusparse.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcusparse.so
        libcusolverMg.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so.11
        libcusolverMg.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcusolverMg.so.11
        libcusolverMg.so.11 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcusolverMg.so.11
        libcusolverMg.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolverMg.so
        libcusolverMg.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcusolverMg.so
        libcusolverMg.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcusolverMg.so
        libcusolver.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.11
        libcusolver.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcusolver.so.11
        libcusolver.so.11 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcusolver.so.11
        libcusolver.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so
        libcusolver.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcusolver.so
        libcusolver.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcusolver.so
        libcurand.so.10 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so.10
        libcurand.so.10 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcurand.so.10
        libcurand.so.10 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcurand.so.10
        libcurand.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcurand.so
        libcurand.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcurand.so
        libcurand.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcurand.so
        libcupti.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so.12
        libcupti.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcupti.so
        libcuinj64.so.12.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so.12.2
        libcuinj64.so.12.2 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcuinj64.so.12.2
        libcuinj64.so.11.8 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcuinj64.so.11.8
        libcuinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcuinj64.so
        libcuinj64.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcuinj64.so
        libcuinj64.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcuinj64.so
        libcufile_rdma.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so.1
        libcufile_rdma.so.1 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufile_rdma.so.1
        libcufile_rdma.so.1 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufile_rdma.so.1
        libcufile_rdma.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile_rdma.so
        libcufile_rdma.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufile_rdma.so
        libcufile_rdma.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufile_rdma.so
        libcufile.so.0 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile.so.0
        libcufile.so.0 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufile.so.0
        libcufile.so.0 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufile.so.0
        libcufile.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufile.so
        libcufile.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufile.so
        libcufile.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufile.so
        libcufftw.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so.11
        libcufftw.so.11 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufftw.so.11
        libcufftw.so.10 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufftw.so.10
        libcufftw.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufftw.so
        libcufftw.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufftw.so
        libcufftw.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufftw.so
        libcufft.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so.11
        libcufft.so.11 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufft.so.11
        libcufft.so.10 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufft.so.10
        libcufft.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcufft.so
        libcufft.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcufft.so
        libcufft.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcufft.so
        libcudart.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.12
        libcudart.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcudart.so.12
        libcudart.so.11.0 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcudart.so.11.0
        libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so
        libcudart.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcudart.so
        libcudart.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcudart.so
        libcudadebugger.so.1 (libc6,x86-64) => /lib64/libcudadebugger.so.1
        libcuda.so.1 (libc6,x86-64) => /lib64/libcuda.so.1
        libcuda.so (libc6,x86-64) => /lib64/libcuda.so
        libcublasLt.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so.12
        libcublasLt.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so.12
        libcublasLt.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so.11
        libcublasLt.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublasLt.so
        libcublasLt.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcublasLt.so
        libcublasLt.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcublasLt.so
        libcublas.so.12 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12
        libcublas.so.12 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcublas.so.12
        libcublas.so.11 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcublas.so.11
        libcublas.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcublas.so
        libcublas.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libcublas.so
        libcublas.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libcublas.so
        libcheckpoint.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcheckpoint.so
        libaccinj64.so.12.2 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so.12.2
        libaccinj64.so.12.2 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libaccinj64.so.12.2
        libaccinj64.so.11.8 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libaccinj64.so.11.8
        libaccinj64.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libaccinj64.so
        libaccinj64.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libaccinj64.so
        libaccinj64.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libaccinj64.so
        libOpenCL.so.1 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so.1
        libOpenCL.so.1 (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libOpenCL.so.1
        libOpenCL.so.1 (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libOpenCL.so.1
        libOpenCL.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libOpenCL.so
        libOpenCL.so (libc6,x86-64) => /opt/cuda-11.8/targets/x86_64-linux/lib/libOpenCL.so
        libOpenCL.so (libc6,x86-64) => /opt/cuda-12.2/targets/x86_64-linux/lib/libOpenCL.so


uname -a
Linux cryoem1 5.14.0-344.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jul 24 09:26:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

 nvidia-smi
Thu Aug 10 21:51:13 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A5000               On  | 00000000:4F:00.0 Off |                    0 |
| 30%   57C    P2             205W / 230W |   2208MiB / 23028MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A5000               On  | 00000000:52:00.0 Off |                    0 |
| 30%   33C    P8              12W / 230W |      5MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA RTX A5000               On  | 00000000:56:00.0 Off |                    0 |
| 30%   33C    P8              14W / 230W |      5MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA RTX A5000               On  | 00000000:57:00.0 Off |                    0 |
| 30%   33C    P8              17W / 230W |      5MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA RTX A5000               On  | 00000000:CE:00.0 Off |                    0 |
| 30%   33C    P8              17W / 230W |      5MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA RTX A5000               On  | 00000000:D1:00.0 Off |                    0 |
| 30%   34C    P8              18W / 230W |      5MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA RTX A5000               On  | 00000000:D5:00.0 Off |                    0 |
| 30%   37C    P8              15W / 230W |      5MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA RTX A5000               On  | 00000000:D6:00.0 Off |                    0 |
| 30%   37C    P8              17W / 230W |      5MiB / 23028MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4659      G   /usr/libexec/Xorg                            10MiB |
|    0   N/A  N/A      4685      G   /usr/bin/gnome-shell                          4MiB |
|    0   N/A  N/A    405166      C   python                                     2178MiB |
+---------------------------------------------------------------------------------------+


./bin/cryosparcw gpulist
  Detected 8 CUDA devices.

   id           pci-bus  name
   ---------------------------------------------------------------
       0      0000:4F:00.0  NVIDIA RTX A5000
       1      0000:52:00.0  NVIDIA RTX A5000
       2      0000:56:00.0  NVIDIA RTX A5000
       3      0000:57:00.0  NVIDIA RTX A5000
       4      0000:CE:00.0  NVIDIA RTX A5000
       5      0000:D1:00.0  NVIDIA RTX A5000
       6      0000:D5:00.0  NVIDIA RTX A5000
       7      0000:D6:00.0  NVIDIA RTX A5000
   ---------------------------------------------------------------

Issue
Type of Issue : Multiple jobs requiring GPU are not running at the same time

When I am trying to run multiple jobs requiring gpu then only one job runs other job gives error. When I run one job at a time, I can use one or more gpu and job finishes smoothly but if I run any other job requiring gpu while previous one is running then one of the jobs gives error.
Cryosparc used to run smoothly before, I recently updated cuda driver and cryosparc.

Please see following error from class 2D job

[CPU:  14.27 GB]
Traceback (most recent call last):
  File "/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py", line 2061, in run_with_except_hook
    run_old(*args, **kw)
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1028, in cryosparc_compute.engine.engine.process.work
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 89, in cryosparc_compute.engine.engine.EngineThread.load_image_data_gpu
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 311, in cryosparc_compute.engine.cuda_core.EngineBaseThread.toc
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 307, in cryosparc_compute.engine.cuda_core.EngineBaseThread.wait
pycuda._driver.LaunchError: cuStreamSynchronize failed: unspecified launch failure

Following error from Patch Motion

[CPU:  287.5 MB]
Error occurred while processing J3/imported/012324149470467535379_14sep05c_c_00003gr_00014sq_00011hl_00004es.frames.tif
Traceback (most recent call last):
  File "/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/jobs/pipeline.py", line 60, in exec
    return self.process(item)
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 177, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 180, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/run_patch.py", line 182, in cryosparc_compute.jobs.motioncorrection.run_patch.run_patch_motion_correction_multi.motionworker.process
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 255, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc_master/cryosparc_compute/jobs/motioncorrection/patchmotion.py", line 257, in cryosparc_compute.jobs.motioncorrection.patchmotion.unbend_motion_correction
  File "cryosparc_master/cryosparc_compute/engine/cuda_core.py", line 293, in cryosparc_compute.engine.cuda_core.EngineBaseThread.__init__
pycuda._driver.LaunchError: cuStreamCreate failed: unspecified launch failure

so this error comes with all the jobs with gpu requirement when run simultaneously.

Unfortunately, job.log file is empty.

Any suggestion to resolve this issue would be very useful.

Thanks in advance.

Thanks @nikydna for posting the diagnostic datapoints.
Please can you describe the steps of “updating cuda driver and cryosparc”.
Which commands did you run inside and outside of CryoSPARC in what sequence?

Thank you for your reply.

Here are the steps that I followed.

wget https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda-repo-rhel9-12-2-local-12.2.1_535.86.10-1.x86_64.rpm
sudo rpm -i cuda-repo-rhel9-12-2-local-12.2.1_535.86.10-1.x86_64.rpm
sudo dnf clean all
sudo dnf -y module install nvidia-driver:latest-dkms
sudo dnf -y install cuda

systemctl status nvidia-persistenced
sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d
sudo sed -i ‘s/SUBSYSTEM!=“memory”,.GOTO=“memory_hotplug_end”/SUBSYSTEM=="", GOTO=“memory_hotplug_end”/’ /etc/udev/rules.d/40-redhat.rules
reboot
sudo dnf install freeglut-devel libX11-devel libXi-devel libXmu-devel make mesa-libGLU-devel freeimage-devel libglfw3-devel

deviceQuery bandwidthTest precomiled binaries checked whether working or not

chmod +x cuda_11.8.0_520.61.05_linux.run

./cuda_11.8.0_520.61.05_linux.run --silent --toolkit --toolkitpath=/opt/cuda-11.8

module load cuda/11.8

cryosparcm update

apart from this, today I did fresh installation of cryosparc still I am facing same issue.

FYI, relion jobs are running well. (although, relion is compled with different version of cuda)

I dont know what else to try

Now, none of the jobs are able to run.

Please see the job.log file from 2D classification job with 2 gpus from tutorial dataset:

================= CRYOSPARCW ======= 2023-08-12 03:18:17.329811 =========
Project P18 Job J79
Master cryoem1.sbs.ntu.edu.sg Port 39002

========= monitor process now starting main process at 2023-08-12 03:18:17.329841
MAINPROCESS PID 91694
========= monitor process now waiting for main process
MAIN PID 91694
class2D.run cryosparc_compute.jobs.jobregister
gpufft: creating new cufft plan (plan id 0 pid 91694)
gpu_id 2
ndims 2
dims 448 448 0
inembed 448 448 0
istride 1
idist 200704
onembed 448 448 0
ostride 1
odist 200704
batch 500
type C2C
wkspc automatic
Python traceback:

gpufft: creating new cufft plan (plan id 1 pid 91694)
gpu_id 2
ndims 2
dims 448 448 0
inembed 448 448 0
istride 1
idist 200704
onembed 448 448 0
ostride 1
odist 200704
batch 500
type C2C
wkspc automatic
Python traceback:

gpufft: creating new cufft plan (plan id 2 pid 91694)
gpu_id 3
ndims 2
dims 448 448 0
inembed 448 448 0
istride 1
idist 200704
onembed 448 448 0
ostride 1
odist 200704
batch 500
type C2C
wkspc automatic
Python traceback:

gpufft: creating new cufft plan (plan id 3 pid 91694)
gpu_id 3
ndims 2
dims 448 448 0
inembed 448 448 0
istride 1
idist 200704
onembed 448 448 0
ostride 1
odist 200704
batch 500
type C2C
wkspc automatic
Python traceback:

gpufft: creating new cufft plan (plan id 4 pid 91694)
gpu_id 4
ndims 2
dims 448 448 0
inembed 448 448 0
istride 1
idist 200704
onembed 448 448 0
ostride 1
odist 200704
batch 500
type C2C
wkspc automatic
Python traceback:


Running job J79 of type class_2D
Running job on hostname %s cryoem1.sbs.ntu.edu.sg
Allocated Resources : {‘fixed’: {‘SSD’: True}, ‘hostname’: ‘cryoem1’, ‘lane’: ‘default’, ‘lane_type’: ‘node’, ‘license’: True, ‘licenses_acquired’: 3, ‘slots’: {‘CPU’: [4, 5], ‘GPU’: [2, 3, 4], ‘RAM’: [1, 2, 6]}, ‘target’: {‘cache_path’: ‘/scratch/’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 1, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 2, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 3, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 4, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 5, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 6, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}, {‘id’: 7, ‘mem’: 23824629760, ‘name’: ‘NVIDIA RTX A5000’}], ‘hostname’: ‘basak-cryoem1.sbs.ntu.edu.sg’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘basak-cryoem1.sbs.ntu.edu.sg’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], ‘GPU’: [0, 1, 2, 3, 4, 5, 6, 7], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]}, ‘ssh_str’: ‘cryoem1’, ‘title’: ‘Worker node cryoem1’, ‘type’: ‘node’, ‘worker_bin_path’: ‘softwares/cryosparc/cryosparc_worker/bin/cryosparcw’}}
HOST ALLOCATION FUNCTION: using cudrv.pagelocked_empty
========= sending heartbeat at 2023-08-12 03:18:30.327001
========= sending heartbeat at 2023-08-12 03:18:40.338501
========= sending heartbeat at 2023-08-12 03:18:50.352124
========= sending heartbeat at 2023-08-12 03:19:00.365850
========= sending heartbeat at 2023-08-12 03:19:10.379715
========= sending heartbeat at 2023-08-12 03:19:20.393306
========= sending heartbeat at 2023-08-12 03:19:30.407083
========= sending heartbeat at 2023-08-12 03:19:40.420711
========= sending heartbeat at 2023-08-12 03:19:50.433681
========= sending heartbeat at 2023-08-12 03:20:00.447085
========= sending heartbeat at 2023-08-12 03:20:10.460626
========= sending heartbeat at 2023-08-12 03:20:20.474109
========= sending heartbeat at 2023-08-12 03:20:30.487928
========= sending heartbeat at 2023-08-12 03:20:40.501359
========= sending heartbeat at 2023-08-12 03:20:50.514773
========= sending heartbeat at 2023-08-12 03:21:00.528120
========= sending heartbeat at 2023-08-12 03:21:10.541783
========= sending heartbeat at 2023-08-12 03:21:20.555386
========= sending heartbeat at 2023-08-12 03:21:30.569106
========= sending heartbeat at 2023-08-12 03:21:40.582696
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/plotutil.py:1512: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).
fig = plt.figure(figsize=figsize)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: divide by zero encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:976: RuntimeWarning: invalid value encountered in double_scalars
x = (thresh - fa) * (b-a) / (fb - fa) + a
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
========= sending heartbeat at 2023-08-12 03:21:50.596621
========= sending heartbeat at 2023-08-12 03:22:00.610250
========= sending heartbeat at 2023-08-12 03:22:10.624132
========= sending heartbeat at 2023-08-12 03:22:20.638137
========= sending heartbeat at 2023-08-12 03:22:30.651964
========= sending heartbeat at 2023-08-12 03:22:40.665759
========= sending heartbeat at 2023-08-12 03:22:50.679446
========= sending heartbeat at 2023-08-12 03:23:00.693083
========= sending heartbeat at 2023-08-12 03:23:10.706893
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
**custom thread exception hook caught something
**** handle exception rc
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuStreamDestroy failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
**custom thread exception hook caught something
**** handle exception rc
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuStreamDestroy failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
**custom thread exception hook caught something
**** handle exception rc
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuStreamDestroy failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
set status to failed
set status to failed
set status to failed
**custom thread exception hook caught something
**** handle exception rc
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuStreamDestroy failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFreeHost failed: unspecified launch failure
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: unspecified launch failure
**custom thread exception hook caught something
**** handle exception rc
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: divide by zero encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: divide by zero encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:976: RuntimeWarning: invalid value encountered in double_scalars
x = (thresh - fa) * (b-a) / (fb - fa) + a
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: divide by zero encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:976: RuntimeWarning: invalid value encountered in double_scalars
x = (thresh - fa) * (b-a) / (fb - fa) + a
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: divide by zero encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:976: RuntimeWarning: invalid value encountered in double_scalars
x = (thresh - fa) * (b-a) / (fb - fa) + a
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/_methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AABB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: divide by zero encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA
BB))[:copylen]
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/util/logsumexp.py:40: RuntimeWarning: divide by zero encountered in log
return n.log(wa * n.exp(a - vmax) + wb * n.exp(b - vmax) ) + vmax
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3372: RuntimeWarning: Mean of empty slice.
return _methods.mean(a, axis=axis, dtype=dtype,
/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numpy/core/methods.py:170: RuntimeWarning: invalid value encountered in true_divide
ret = ret.dtype.type(ret / rcount)
/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/sigproc.py:894: RuntimeWarning: invalid value encountered in true_divide
frc[k, :copylen] = (AB / n.sqrt(AA*BB))[:copylen]
Traceback (most recent call last):
File “/home/sandip/softwares/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 2118, in run_with_except_hook
run_old(*args, **kw)
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 131, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 132, in cryosparc_compute.engine.cuda_core.GPUThread.run
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 1049, in cryosparc_compute.engine.engine.process.work
File “cryosparc_master/cryosparc_compute/engine/engine.py”, line 209, in cryosparc_compute.engine.engine.EngineThread.setup_current_noise
File “cryosparc_master/cryosparc_compute/engine/cuda_core.py”, line 402, in cryosparc_compute.engine.cuda_core.EngineBaseThread.download
File “/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/gpuarray.py”, line 263, in set_async
return self.set(ary, async
=True, stream=stream)
File “/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/gpuarray.py”, line 260, in set
memcpy_discontig(self, ary, async=async
, stream=stream)
File “/home/sandip/softwares/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/pycuda/gpuarray.py”, line 1313, in _memcpy_discontig
drv.memcpy_htod_async(dst.gpudata, src, stream=stream)
pycuda._driver.LaunchError: cuMemcpyHtoDAsync failed: unspecified launch failure
set status to failed
set status to failed
========= main process now complete at 2023-08-12 03:23:16.205457.
========= monitor process now complete at 2023-08-12 03:23:16.207999.

So, I managed to solve this issue.

First, I did fresh install cryosparc 4.2.1

I removed previous worker configuration using

cryosparcm cli ‘remove_scheduler_target_node(“cryoem1”)’

then I reconnected freshly and updated cryosparc with

cryosparcm update

and tested installation using

cryosparcm install test

Everything was okay.

I also installed 3d flex because GPU pytorch test was failing.

cd cryosparc_worker && ./bin/cryosparcw forcedeps && ./bin/cryosparcw install-3dflex

Then tried to run extended validation, where 2d classification job failed and found out from job.log that there could be some memory allocation issues. Further, I did test for tensor flow and it failed

after that added export CRYOSPARC_NO_PAGELOCK=true in config.sh in cryosparc_worker and restarted cryosparc and everything is running smoothly now.

2 Likes