================= CRYOSPARCW ======= 2023-12-04 13:00:03.524000 =========
Project P11 Job J25
Master ip-XXXXX Port 39002
===========================================================================
========= monitor process now starting main process at 2023-12-04 13:00:03.524035
MAINPROCESS PID 13285
MAIN PID 13285
refine.newrun cryosparc_compute.jobs.jobregister
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "cryosparc_master/cryosparc_compute/run.py", line 189, in cryosparc_master.cryosparc_compute.run.run
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/cryosparc_compute/get_gpu_info.py", line 30, in get_driver_version
return get_version()
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 3318, in get_version
return driver.get_version()
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 465, in get_version
version = driver.cuDriverGetVersion()
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 296, in __getattr__
self.ensure_initialized()
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
self.cuInit(0)
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 352, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File "cuda/cuda.pyx", line 11325, in cuda.cuda.cuInit
File "cuda/ccuda.pyx", line 17, in cuda.ccuda.cuInit
File "cuda/_cuda/ccuda.pyx", line 2353, in cuda._cuda.ccuda._cuInit
RuntimeError: Function "cuInit" not found
***************************************************************
Running job J25 of type homo_refine_new
Running job on hostname %s g5-singlegpu-queue
Allocated Resources : {'fixed': {'SSD': True}, 'hostname': 'g5-singlegpu-queue', 'lane': 'g5-singlegpu-queue', 'lane_type': 'cluster', 'license': True, 'licenses_acquired': 1, 'slots': {'CPU': [0, 1, 2, 3], 'GPU': [0], 'RAM': [0, 1, 2]}, 'target': {'cache_path': '/scratch', 'cache_quota_mb': 1000000, 'cache_reserve_mb': 10000, 'custom_var_names': [], 'custom_vars': {}, 'desc': None, 'hostname': 'g5-singlegpu-queue', 'lane': 'g5-singlegpu-queue', 'name': 'g5-singlegpu-queue', 'qdel_cmd_tpl': 'scancel {{ cluster_job_id }}', 'qinfo_cmd_tpl': 'sinfo', 'qstat_cmd_tpl': 'squeue -j {{ cluster_job_id }}', 'qstat_code_cmd_tpl': None, 'qsub_cmd_tpl': 'sbatch {{ script_path_abs }}', 'script_tpl': '#!/usr/bin/env bash\n#### cryoSPARC cluster submission script template for SLURM\n## Available variables:\n## {{ run_cmd }} - the complete commandstring to run the job\n## {{ num_cpu }} - the number of CPUs needed\n## {{ num_gpu }} - the number of GPUs needed.\n## Note: the code will use this many GPUs starting from dev id 0\n## the cluster scheduler or this script have the responsibility\n## of setting CUDA_VISIBLE_DEVICES so that the job code ends up\n## using the correct cluster-allocated GPUs.\n## {{ ram_gb }} - the amount of RAM needed in GB\n## {{ job_dir_abs }} - absolute path to the job directory\n## {{ project_dir_abs }} - absolute path to the project dir\n## {{ job_log_path_abs }} - absolute path to the log file for the job\n## {{ worker_bin_path }} - absolute path to the cryosparc worker command\n## {{ run_args }} - arguments to be passed to cryosparcw run\n## {{ project_uid }} - uid of the project\n## {{ job_uid }} - uid of the job\n## {{ job_creator }} - name ofthe user that created the job (may contain spaces)\n## {{ cryosparc_username }} - cryosparc username of the user that created the job (usually an email)\n##\n## What follows is a simple SLURM script:\n\n#SBATCH--job-name {{ cryosparc_username }}_{{ project_uid }}_{{ job_uid }}\n#SBATCH -n {{ num_cpu }}\n#SBATCH -N 1\n#SBATCH --constraint="g5.2xlarge|g5.4xlarge|g5.8xlarge"\n#SBATCH --gres=gpu:{{ num_gpu }}\n##SBATCH --mem={{ (ram_gb)|int }}G\n#SBATCH -o {{ job_dir_abs }}/slurm-%j.out\n#SBATCH -e {{ job_dir_abs }}/slurm-%j.err\n#SBATCH --exclusive --partition=g5-singlegpu\n\n{{ run_cmd }}\n', 'send_cmd_tpl': '{{ command }}', 'title': 'g5-singlegpu-queue', 'tpl_vars': ['command', 'cluster_job_id', 'job_log_path_abs', 'job_dir_abs', 'run_cmd', 'run_args', 'cryosparc_username', 'job_uid', 'worker_bin_path', 'project_dir_abs', 'job_creator', 'num_cpu', 'project_uid', 'ram_gb', 'num_gpu'], 'type': 'cluster', 'worker_bin_path': '/wekahome/apps/cryosparc/current/cryosparc_worker/bin/cryosparcw'}}
**** handle exception rc
/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/cryosparc_compute/jobs/motioncorrection/mic_utils.py:95: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
@jit(nogil=True)
/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/cryosparc_compute/micrographs.py:563: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicitdefault value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def contrast_normalization(arr_bin, tile_size = 128):
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 95, in cryosparc_master.cryosparc_compute.run.main
File "cryosparc_master/cryosparc_compute/jobs/refine/newrun.py", line 359, in cryosparc_master.cryosparc_compute.jobs.refine.newrun.run_homo_refine
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/cryosparc_compute/alignment.py", line 112, in align_symmetry
gpucore.initialize([cuda_dev])
File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 47, in cryosparc_master.cryosparc_compute.gpu.gpucore.initialize
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 220, in get_context
return _runtime.get_or_create_context(devnum)
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 144, in get_or_create_context
return self._activate_context_for(devnum)
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 176, in _activate_context_for
gpu = self.gpus[devnum]
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 40, in __getitem__
return self.lst[devnum]
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/devices.py", line 26, in __getattr__
numdev = driver.get_device_count()
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 425, in get_device_count
return self.cuDeviceGetCount()
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 296, in __getattr__
self.ensure_initialized()
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
self.cuInit(0)
File "/wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/site-packages/numba/cuda/cudadrv/driver.py", line 352, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File "cuda/cuda.pyx", line 11325, in cuda.cuda.cuInit
File "cuda/ccuda.pyx", line 17, in cuda.ccuda.cuInit
File "cuda/_cuda/ccuda.pyx", line 2353, in cuda._cuda.ccuda._cuInit
RuntimeError: Function "cuInit" not found
set status to failed
(base) [root@ip-172-31-75-142 J25]# cd -
/wekahome/apps/ctx-scc-driver/scripts/applications/cryosparc/scripts
@wtempel The OS is “alinux2”
NAME=“Amazon Linux”
VERSION=“2”
ID=“amzn”
ID_LIKE=“centos rhel fedora”
VERSION_ID=“2”
PRETTY_NAME=“Amazon Linux 2”
ANSI_COLOR=“0;33”
CPE_NAME=“cpe:2.3:o:amazon:amazon_linux:2”
HOME_URL=“https://amazonlinux.com/”
Amazon Linux release 2 (Karoo)
Nvidia Driver on the GPU node
Tue Dec 5 05:59:28 2023
+-----------------------------------------------------------------------------+
| **NVIDIA-SMI 520.61.05** **Driver Version: 520.61.05 CUDA Version: 11.8** |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 25C P8 23W / 300W | 0MiB / 23028MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Thanks @Praveen. What are the outputs of these commands on that GPU server:
uname -a
/sbin/ldconfig -p | grep cu
?
[Added 2023-12-08:]
In case you have not yet resolved the RuntimeError
please can you compress and email us
the file /tmp/libsdebug.txt
that is created by this command sequence
cd /wekahome/apps/cryosparc/v4.4.0_231114/cryosparc_worker/
eval $(bin/cryosparcw env)
LD_DEBUG=libs python -c "from cuda import cuda; cuda.cuInit(0)" 2> /tmp/libsdebug.txt