I am encountering an error while performing 3D Variability

I have recently updated to v4.6.2 and I started running into a few issues. I was told by our IT help that the io_uring support is checked by using a function in the liburing library which is installed, and the running Linux kernel has support enabled so it’s puzzling that it is not working. And I am not sure what to make of the other issue.

The first issue is the following:
[CPU: 89.8 MB Avail: 244.52 GB]
WARNING: io_uring support disabled (not supported by kernel), I/O performance may degrade

The second issue:
[CPU: 5.93 GB Avail: 239.77 GB]
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 129, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/var3D/run.py”, line 546, in cryosparc_master.cryosparc_compute.jobs.var3D.run.run
File “cryosparc_master/cryosparc_compute/jobs/var3D/run.py”, line 323, in cryosparc_master.cryosparc_compute.jobs.var3D.run.run.E_step
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 400, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.load_models_rspace
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 382, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py”, line 232, in _require_cuda_context
return fn(*args, **kws)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/api.py”, line 189, in pinned_array
buffer = current_context().memhostalloc(bytesize)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1378, in memhostalloc
return self.memory_manager.memhostalloc(bytesize, mapped, portable, wc)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 889, in memhostalloc
pointer = allocator()
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 884, in allocator
return driver.cuMemHostAlloc(size, flags)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE

For an explanation why io_uring may still not be supported, please see Io_uring enabling - #10 by hsnyder.

Please can you post the outputs of these commands

  1. on the CryoSPARC master
    csprojectid=P99 # replace with actual project ID
    csjobid=J199 # replace with id of the failed job
    cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run')"
    
  2. on the CryoSPARC worker where the job ran and failed
    uptime
    uname -a 
    nvidia-smi
    /home/cryosparc_user/cryosparc_worker/bin/cryosparcw gpulist
    

Here is the output:

{‘_id’: ‘674ce620d558853f1556fb36’, ‘errors_run’: [{‘message’: ‘[CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE’, ‘warning’: False}], ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘240.73GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz’, ‘driver_version’: ‘12.4’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:3b:00’}, {‘id’: 1, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:5e:00’}, {‘id’: 2, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:86:00’}, {‘id’: 3, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:d8:00’}], ‘ofd_hard_limit’: 262144, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 24, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘thelma’, ‘platform_release’: ‘5.4.286-1.el8.elrepo.x86_64’, ‘platform_version’: ‘#1 SMP Sun Nov 17 11:28:26 EST 2024’, ‘total_memory’: ‘251.53GB’, ‘used_memory’: ‘7.22GB’}, ‘job_type’: ‘var_3D’, ‘params_spec’: {‘compute_use_ssd’: {‘value’: False}, ‘var_K’: {‘value’: 4}, ‘var_filter_res’: {‘value’: 5}}, ‘project_uid’: ‘P17’, ‘status’: ‘failed’, ‘uid’: ‘J59’, ‘version’: ‘v4.6.2’}

Here is the output:

16:47:17 up 6 days, 6:28, 5 users, load average: 1.06, 1.16, 1.09

Linux thelma 5.4.286-1.el8.elrepo.x86_64 #1 SMP Sun Nov 17 11:28:26 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

Thu Dec 19 16:47:29 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.135 Driver Version: 550.135 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:3B:00.0 On | N/A |
| 31% 33C P8 24W / 250W | 326MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 2080 Ti Off | 00000000:5E:00.0 Off | N/A |
| 31% 27C P8 1W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 2 NVIDIA GeForce RTX 2080 Ti Off | 00000000:86:00.0 Off | N/A |
| 33% 30C P8 1W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 3 NVIDIA GeForce RTX 2080 Ti Off | 00000000:D8:00.0 Off | N/A |
| 32% 31C P8 8W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 6261 G /usr/libexec/Xorg 120MiB |
| 0 N/A N/A 6397 G /usr/bin/gnome-shell 39MiB |
| 0 N/A N/A 7820 G /usr/lib64/firefox/firefox 161MiB |
| 1 N/A N/A 6261 G /usr/libexec/Xorg 4MiB |
| 2 N/A N/A 6261 G /usr/libexec/Xorg 4MiB |
| 3 N/A N/A 6261 G /usr/libexec/Xorg 4MiB |
±----------------------------------------------------------------------------------------+

-bash: /home/cryosparc_user/cryosparc_worker/bin/cryosparcw: Permission denied

Thanks @nmillan for posting the outputs.
Please can you post the outputs of this command on thelma:

grep -v LICENSE_ID /home/cryosparc_user/cryosparc_worker/config.sh

If that file does not already contain a line

export CRYOSPARC_NO_PAGELOCK=true

please add or adjust that line to/in that file and test if the change resolves the CUDA_ERROR_INVALID_VALUE issue.
To view and, if needed, change /home/cryosparc_user/cryosparc_worker/config.sh, one may have to be logged in to the cryosparc_user Linux account.

I have added that line and tested it but it has not yet resolved the CUDA_ERROR_INVALID_VALUE issue.

Please can you post the end of a job log for a job that failed with CUDA_ERROR_INVALID_VALUE after

had been defined. You may use the command (after appropriately modifying P and J IDs):

cryosparcm joblog P99 J199 | tail -n 40

and post its output.

Here is the output:
========= sending heartbeat at 2024-12-30 19:33:24.451955
========= sending heartbeat at 2024-12-30 19:33:34.470288
========= sending heartbeat at 2024-12-30 19:33:44.489571
========= sending heartbeat at 2024-12-30 19:33:54.507268
========= sending heartbeat at 2024-12-30 19:34:04.525384
========= sending heartbeat at 2024-12-30 19:34:14.545750
/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/dispatcher.py:536: NumbaPerformanceWarning: Grid size 1 will likely result in GPU under-utilization due to low occupancy.
warn(NumbaPerformanceWarning(msg))


Transparent hugepages setting: always madvise [never]

Running job J59 of type var_3D
Running job on hostname %s localhost
Allocated Resources : {‘fixed’: {‘SSD’: False}, ‘hostname’: ‘localhost’, ‘lane’: ‘default’, ‘lane_type’: ‘node’, ‘license’: True, ‘licenses_acquired’: 1, ‘slots’: {‘CPU’: [0, 1, 2, 3], ‘GPU’: [0], ‘RAM’: [0, 1, 2]}, ‘target’: {‘cache_path’: ‘/CryoSparc/cryosparc_scratch’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 1, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 2, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}, {‘id’: 3, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’}], ‘hostname’: ‘localhost’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘localhost’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47], ‘GPU’: [0, 1, 2, 3], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]}, ‘ssh_str’: ‘cryosparc_user@localhost’, ‘title’: ‘Worker node localhost’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/cryosparc_user/cryosparc_worker/bin/cryosparcw’}}
HOST ALLOCATION FUNCTION: using numba.cuda.pinned_array
**** handle exception rc
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 129, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/var3D/run.py”, line 546, in cryosparc_master.cryosparc_compute.jobs.var3D.run.run
File “cryosparc_master/cryosparc_compute/jobs/var3D/run.py”, line 323, in cryosparc_master.cryosparc_compute.jobs.var3D.run.run.E_step
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 400, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.load_models_rspace
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 382, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py”, line 232, in _require_cuda_context
return fn(*args, **kws)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/api.py”, line 189, in pinned_array
buffer = current_context().memhostalloc(bytesize)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1378, in memhostalloc
return self.memory_manager.memhostalloc(bytesize, mapped, portable, wc)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 889, in memhostalloc
pointer = allocator()
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 884, in allocator
return driver.cuMemHostAlloc(size, flags)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE
set status to failed
========= main process now complete at 2024-12-30 19:34:24.567087.

@nmillan Please can you post the output of the command

grep -v LICENSE /home/cryosparc_user/cryosparc_worker/config.sh

Here is the output from the command
export CRYOSPARC_USE_GPU=true

@nmillan It seems that the definition

has not been added to

Please can you add that definition to the file and check if you can run 3DVA job after the change.

Hello, I manually added the definition to the config.sh file (sorry for the bad quality)

I then called the file using ./config.sh and ran the grep -v LICENSE /home/cryosparc_user/cryosparc_worker/config.sh command which gave the following output:

export CRYOSPARC_USE_GPU=true
export CRYOSPARC_CUDA_PATH=“/usr/local/cuda”
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_NO_PAGELOCK=true

Then I tried running the 3DVA job but I am still getting the same error.

Did you run that job after Dec 30, 2024? If so, please can you post the output of the command

cryosparcm joblog P99 J199 | tail -n 50

(after having replaced P99, J199 with the project and job IDs of the latest failed 3DVA job).

Hello, I am sorry for my delayed response. Recently our IT specialist updated the kernel to fix the io_uring issues and in setting up cryosparc to make sure it was working well, we resolved the CUDA problem I was facing here. Thank you so much for your help along the way!

1 Like