I am encountering an error while performing 3D Variability

I have recently updated to v4.6.2 and I started running into a few issues. I was told by our IT help that the io_uring support is checked by using a function in the liburing library which is installed, and the running Linux kernel has support enabled so it’s puzzling that it is not working. And I am not sure what to make of the other issue.

The first issue is the following:
[CPU: 89.8 MB Avail: 244.52 GB]
WARNING: io_uring support disabled (not supported by kernel), I/O performance may degrade

The second issue:
[CPU: 5.93 GB Avail: 239.77 GB]
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 129, in cryosparc_master.cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/var3D/run.py”, line 546, in cryosparc_master.cryosparc_compute.jobs.var3D.run.run
File “cryosparc_master/cryosparc_compute/jobs/var3D/run.py”, line 323, in cryosparc_master.cryosparc_compute.jobs.var3D.run.run.E_step
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 400, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.load_models_rspace
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 382, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py”, line 232, in _require_cuda_context
return fn(*args, **kws)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/api.py”, line 189, in pinned_array
buffer = current_context().memhostalloc(bytesize)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1378, in memhostalloc
return self.memory_manager.memhostalloc(bytesize, mapped, portable, wc)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 889, in memhostalloc
pointer = allocator()
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 884, in allocator
return driver.cuMemHostAlloc(size, flags)
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/home/cryosparc_user/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE

For an explanation why io_uring may still not be supported, please see Io_uring enabling - #10 by hsnyder.

Please can you post the outputs of these commands

  1. on the CryoSPARC master
    csprojectid=P99 # replace with actual project ID
    csjobid=J199 # replace with id of the failed job
    cryosparcm cli "get_job('$csprojectid', '$csjobid', 'job_type', 'version', 'instance_information', 'status',  'params_spec', 'errors_run')"
    
  2. on the CryoSPARC worker where the job ran and failed
    uptime
    uname -a 
    nvidia-smi
    /home/cryosparc_user/cryosparc_worker/bin/cryosparcw gpulist
    

Here is the output:

{‘_id’: ‘674ce620d558853f1556fb36’, ‘errors_run’: [{‘message’: ‘[CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE’, ‘warning’: False}], ‘instance_information’: {‘CUDA_version’: ‘11.8’, ‘available_memory’: ‘240.73GB’, ‘cpu_model’: ‘Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz’, ‘driver_version’: ‘12.4’, ‘gpu_info’: [{‘id’: 0, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:3b:00’}, {‘id’: 1, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:5e:00’}, {‘id’: 2, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:86:00’}, {‘id’: 3, ‘mem’: 11539054592, ‘name’: ‘NVIDIA GeForce RTX 2080 Ti’, ‘pcie’: ‘0000:d8:00’}], ‘ofd_hard_limit’: 262144, ‘ofd_soft_limit’: 1024, ‘physical_cores’: 24, ‘platform_architecture’: ‘x86_64’, ‘platform_node’: ‘thelma’, ‘platform_release’: ‘5.4.286-1.el8.elrepo.x86_64’, ‘platform_version’: ‘#1 SMP Sun Nov 17 11:28:26 EST 2024’, ‘total_memory’: ‘251.53GB’, ‘used_memory’: ‘7.22GB’}, ‘job_type’: ‘var_3D’, ‘params_spec’: {‘compute_use_ssd’: {‘value’: False}, ‘var_K’: {‘value’: 4}, ‘var_filter_res’: {‘value’: 5}}, ‘project_uid’: ‘P17’, ‘status’: ‘failed’, ‘uid’: ‘J59’, ‘version’: ‘v4.6.2’}

Here is the output:

16:47:17 up 6 days, 6:28, 5 users, load average: 1.06, 1.16, 1.09

Linux thelma 5.4.286-1.el8.elrepo.x86_64 #1 SMP Sun Nov 17 11:28:26 EST 2024 x86_64 x86_64 x86_64 GNU/Linux

Thu Dec 19 16:47:29 2024
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.135 Driver Version: 550.135 CUDA Version: 12.4 |
|-----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2080 Ti Off | 00000000:3B:00.0 On | N/A |
| 31% 33C P8 24W / 250W | 326MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 2080 Ti Off | 00000000:5E:00.0 Off | N/A |
| 31% 27C P8 1W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 2 NVIDIA GeForce RTX 2080 Ti Off | 00000000:86:00.0 Off | N/A |
| 33% 30C P8 1W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 3 NVIDIA GeForce RTX 2080 Ti Off | 00000000:D8:00.0 Off | N/A |
| 32% 31C P8 8W / 250W | 6MiB / 11264MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 6261 G /usr/libexec/Xorg 120MiB |
| 0 N/A N/A 6397 G /usr/bin/gnome-shell 39MiB |
| 0 N/A N/A 7820 G /usr/lib64/firefox/firefox 161MiB |
| 1 N/A N/A 6261 G /usr/libexec/Xorg 4MiB |
| 2 N/A N/A 6261 G /usr/libexec/Xorg 4MiB |
| 3 N/A N/A 6261 G /usr/libexec/Xorg 4MiB |
±----------------------------------------------------------------------------------------+

-bash: /home/cryosparc_user/cryosparc_worker/bin/cryosparcw: Permission denied

Thanks @nmillan for posting the outputs.
Please can you post the outputs of this command on thelma:

grep -v LICENSE_ID /home/cryosparc_user/cryosparc_worker/config.sh

If that file does not already contain a line

export CRYOSPARC_NO_PAGELOCK=true

please add or adjust that line to/in that file and test if the change resolves the CUDA_ERROR_INVALID_VALUE issue.
To view and, if needed, change /home/cryosparc_user/cryosparc_worker/config.sh, one may have to be logged in to the cryosparc_user Linux account.