Support for RTX5000 series GPUs?

Remedy, recompile RELION without CUDA_ARCH as recently discussed at ccpem.

Regards

1 Like

Update day is here! I’m excited to start processing on our new GPUs. Thanks to the team for getting this out.

2 Likes

Yeah, finally! Thanks a lot. Going to test this right now =)

1 Like

Hey guys,

how is your experience with the v4.7.1 cuda12? I made a fresh installation and back attached my project. By only updating my previous version, I got an memalloc error running a homo refinement. But the fresh installation did not fix it. Here is the error I get:

[CPU: 7.10 GB Avail: 108.52 GB]

Traceback (most recent call last):
File “/opt/cryosparc/cryosparc_worker/cryosparc_compute/jobs/runcommon.py”, line 2306, in run_with_except_hook
run_old(*args, **kw)
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/threading.py”, line 953, in run
self._target(*self._args, **self._kwargs)
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 2730, in cryosparc_master.cryosparc_compute.engine.newengine.process.work
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 2808, in cryosparc_master.cryosparc_compute.engine.newengine.process.work
File “cryosparc_master/cryosparc_compute/engine/newengine.py”, line 1399, in cryosparc_master.cryosparc_compute.engine.newengine.EngineThread.compute_resid_pow
File “cryosparc_master/cryosparc_compute/gpu/gpucore.py”, line 382, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py”, line 232, in _require_cuda_context
return fn(*args, **kws)
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/api.py”, line 189, in pinned_array
buffer = current_context().memhostalloc(bytesize)
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 1378, in memhostalloc
return self.memory_manager.memhostalloc(bytesize, mapped, portable, wc)
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 889, in memhostalloc
pointer = allocator()
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 884, in allocator
return driver.cuMemHostAlloc(size, flags)
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 348, in safe_cuda_api_call
return self._check_cuda_python_error(fname, libfn(*args))
File “/opt/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py”, line 408, in _check_cuda_python_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE

Do you experience this? I’m running the latest nvidia driver “NVIDIA-SMI 570.158.01” and a RTX5070+RTX5090. The job starts and after some K particles, the error occurs.

Best,

Stefan

I will let you know when I get there. I have a machine with 1x5090 and I am doing motion correction currently. That is running smoothly, 6k movies corrected so far in 24 hours.

I found that 2D classification with a smaller box size (380 binned to 128) runs on the 5070 but the 5090 gives the error. I get the same error on the 5070 with a bigger boxsize (380 binned to 256).

Oh 6k movies in 24h seems not so fast actually. Do you have a reference of the same dataset?

Lg,
Stefan

I do, and you can help me check my math. I processed 550 of these same movies on a machine with 2x Titan X Pascal. They have 12 GB of VRAM and 480.4 GB/sec bandwidth. It took 3 hours and 9 minutes to finish, so that’s 2.91 micrographs/minute. On the 5090, so far 7520 movies have been processed in 25 hours and 48 minutes. The 5090 has 32 GB of VRAM and 1790 GB/sec bandwidth. That is 4.85 micrographs/minute.

I’m not sure how to best compare these and won’t claim to understand computation. 1x 5090 has 1.85x the bandwidth of 2x Titan X Pascal, and it motion corrects at 1.66x the speed. So overall, processing seems to correlate pretty well with bandwidth. I don’t really know what having 2xGPU vs 1xGPU does in terms of processing speed.

Edit: After thinking about it more, I think the 5090 is doing very well and is cryosparc is using its bandwidth to a great extent. A 5090 processes movies 3.3x fast as a Titan X, so I am feeling the upgrade for sure.

Edit: I got the same error as @StefanG88 working with a big particle stack in ab initio, but once I split it inot 4 pieces I could proceed. I will leave this post up as reference. Is there a way to calculate whether a particle stack can be handled by a system?

I have 1x 5090 with 128GB ram, an AMD 9950x CPU and a 4 TB SSD. I successfully ran patch motion correction, patch CTF, denoising, template picking, junk detector, and particle extraction. Now I ran an ab initio and am running into problems. I have a big particle stack, but I feel like this system should handle it. I have 3.3 million particles with box size 600 binned to 100. The ab initio job has all default parameters except for 6 classes with 0.5 class similarity. Is this too big a stack, or is there something else happening here?

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 129, in cryosparc_master.cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/abinit/run.py", line 330, in cryosparc_master.cryosparc_compute.jobs.abinit.run.run_homo_abinit
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1204, in cryosparc_master.cryosparc_compute.engine.engine.process
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1205, in cryosparc_master.cryosparc_compute.engine.engine.process
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 1144, in cryosparc_master.cryosparc_compute.engine.engine.process.work
  File "cryosparc_master/cryosparc_compute/engine/engine.py", line 358, in cryosparc_master.cryosparc_compute.engine.engine.EngineThread.compute_resid_pow
  File "cryosparc_master/cryosparc_compute/gpu/gpucore.py", line 382, in cryosparc_master.cryosparc_compute.gpu.gpucore.EngineBaseThread.ensure_allocated
  File "/home/turul_csparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context
    return fn(*args, **kws)
  File "/home/turul_csparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/api.py", line 189, in pinned_array
    buffer = current_context().memhostalloc(bytesize)
  File "/home/turul_csparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1378, in memhostalloc
    return self.memory_manager.memhostalloc(bytesize, mapped, portable, wc)
  File "/home/turul_csparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 889, in memhostalloc
    pointer = allocator()
  File "/home/turul_csparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 884, in allocator
    return driver.cuMemHostAlloc(size, flags)
  File "/home/turul_csparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/turul_csparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_INVALID_VALUE] Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE

Hi Marcell,
As i understood, a FT job should scale with Cuda cores (determining the degree of parallelism), the Bandwidth (giving how fast data can be transferred) and VRAM (amount of cached data). The RTX5090 outperforms the Titan by 580% cuda cores, 240% bandwidth and 170% VRAM. The FP32 compute capability is almost 1000% enhanced. So alone from this specs I would expect at least a 10x increase in speed on this job.
I did not start my project from scratch, I detached the project, did a clean new installation and attached the project again but I was running version 4.7 already before. So I would not expect problems by switching versions.
I see this error on the same particle stack of about 2.2 Mio. I just wanted to increase resolution by less binning.
Right now, I’m running a very small test dataset on our new workstation with 2x5090 and 2x4090. So far everything runs but I didn’t touch 3D jet. I can say that topaz does not run on the 5090, but this was kind of expected. Wondering when a new version comes out.

Hi @StefanG88 and everyone,

I have been doing some testing as well. Like Stefan, I get cuMemHostAlloc errors regularly on my workstation with 1x 5090 when I run 3D refinement jobs. There were no issues with motion correction, CTF, or particle picking. I have tried binning particles, using smaller particle stacks, and using low memory mode as well in NU-refine. Even with 5,000 particles with box size 600 binned to 300, I get an almost instant failure in NU-refine jobs. If it helps the devs or anyone else, I can share more full logs and errors. I updated the driver to 570.153.02 and am using cuda 12.8. Is there anything else I need to update?

I took exported a particle stack of 1.1 million particles (Box 600 binned to 300) from my 5090 machine and imported them onto a machine with 2x Titan X Pascal. That workstation processed the particles with no issues, and I was able to do ab-initio, NU-refine, 3D classification, and subsequent NU-refines with no issues.

Have you already tried adding

export CRYOSPARC_NO_PAGELOCK="true"

to the worker configuration file at cryosparc_worker/config.sh?

What versions of CryoSPARC and the nvidia driver are running on the machine with the Titan X card?

1 Like

Wow! This seems to have solved it. I have since successfully run a homo refine and a NU-refine. They take about 14 minutes with 300k particles. I am currently doing multiple NU-refines on the same GPU concurrently and it is running smooth. What does this command do?

For reference, the Titan X workstation is running 4.7.0 and the driver is 545.23.08

Looks like export CRYOSPARC_NO_PAGELOCK=“true” also solved the mem issue for me. thanks for the tip =)

Does all of this apply to the NVIDIA RTX PRO 6000 Blackwell Max-Q as well?

Welcome to the forum @kbus1497 .

It depends on the specific errors that are observed. Please can you post the errors that you may have encountered?

2 posts were split to a new topic: Cuda_error_invalid_handle