Local Filtering GPU/CPU option

rbs_sci · August 7, 2024, 11:51pm

Hi CryoSPARC team,

There is an option in Local Filtering to choose between GPU or CPU (but told to use GPU). Unfortunately, when I use the GPU, it crashes with a CUDA memory error (there is no low memory mode option as with NU/Local refine).

If CPU is set manually, it uses the GPU anyway and continues to crash.

Any solution? Or should I export the half-maps and local res estimation output and run it on a different system with more GPU memory? (768 pixel box, BTW).

Cheers,
R

edit (output):

Traceback (most recent call last):
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 851, in _attempt_allocation
    return allocator()
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 115, in cryosparc_master.cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 243, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.run_locfilter
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 292, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.standalone_locfilter
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 333, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.standalone_locfilter
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 276, in zeros
    arr = empty(shape, dtype, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 270, in empty
    return device_array(shape, dtype, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 226, in device_array
    arr = GPUArray(shape=shape, strides=strides, dtype=dtype, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 21, in __init__
    super().__init__(shape, strides, dtype, stream, gpu_data)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devicearray.py", line 103, in __init__
    gpu_data = devices.get_context().memalloc(self.alloc_size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1372, in memalloc
    return self.memory_manager.memalloc(bytesize)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1056, in memalloc
    ptr = self._attempt_allocation(allocator)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 863, in _attempt_allocation
    return allocator()
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

rbs_sci · August 13, 2024, 2:05am

Figured I’d give this a bump; any chance of a “low memory mode” for local filtering?

kstachowski · August 14, 2024, 5:03pm

Hi @rbs_sci,

Just wanted to let you know that were working on reproducing this issue internally.

All the best,
Kye

rbs_sci · August 15, 2024, 2:12am

Cheers, @kstachowski; let me know if I can help sending any logs.

kstachowski · August 15, 2024, 5:08pm

Hi @rbs_sci,

We have been able to reproduce the issue and it is indeed a bug. We will notify you when we have a fix available.

Re:

Any solution? Or should I export the half-maps and local res estimation output and run it on a different system with more GPU memory?

I have been able to successfully run local filtering jobs on an RTX-4090 (24 GB) and a Quadro GV100 (34 GB) using a box size of 760px. Depending on what you’re currently using and what you have available to you, you might be able to run this on a different system.

Cheers,
Kye

rbs_sci · August 15, 2024, 11:59pm

Cheers, @kstachowski

Pulling the half-maps out to a different system isn’t my preferred solution but it’ll have to do for now.

Thanks again.

rbs_sci · August 16, 2024, 1:34am

Problem with this.

Importing half maps as half_map_A and half_map_B in the Import Volumes function screws everything up.

The Local Res job cannot be started if both half-map A and half-map B are connected (Queue button says “please connect all required… blah blah…” but can be clicked if only half-map A is connected. Unfortunately, this results in the following error:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 115, in cryosparc_master.cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/local_resolution/run.py", line 623, in cryosparc_master.cryosparc_compute.jobs.local_resolution.run.run_locres
AssertionError: two maps are identical

As it can’t find half-map B.

If I wildcard the import to cover both half maps, both are loaded but Local Res still complains that half maps are identical, because it’s only loading one.

If I import as normal maps, Local Res won’t run as the connected maps are not “half maps”.

I cannot manually connect half-map B by opening the “Slots” option.

rbs_sci · August 16, 2024, 1:49am

If I export results group (see also: When "Export" is not the same as "Export")

And then import the results group, I get the following error (because the volumes are only symlinks, not the volumes!):

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 115, in cryosparc_master.cryosparc_compute.run.main
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/jobs/imports/run.py", line 1373, in run_import_result_group
    assert missing_paths == 0, (
AssertionError: Unable to find 1 file(s) referred to in dataset J59_volume_exported.cs, field map_half_A/path. Affected files are listed above.

OK, I’ll grab the volumes remotely to fix this.

rbs_sci · August 16, 2024, 1:53am

Also, why does it say this at the start when running:

Resampling mask to box size 768

When the mask is already 768^3?

rbs_sci · August 16, 2024, 1:58am

After fun with exports and groups and symlinks, Local Resolution Filtering ran successfully on 48GB GPUs (although I think it would have been OK on 24GB ones as well…)

Well, I learned something new (all “Exports” are not created equal, and importing individual half-maps then feeding them to subsequent jobs is a pain) and it’s worked so… yay!

kstachowski · August 16, 2024, 1:13pm

Hi @rbs_sci,

If you import half-maps, you have to use the low-level outputs (LLO) and low-level inputs to connect them properly. Once you connect one of the half-maps (A) using the high-level outputs, you have to use LLO to connect half-map B to the correct slot as well as the locres volume. I noticed this in my testing and added to our internal notes.

I’m glad you got it working though!

All the best,
Kye

rbs_sci · December 16, 2024, 12:48am

Had this his again last night with a 1050 pixel box reconstruction.

The resolution estimation is complete, but local filtering crashes with a CUDA out of memory error. Because of what it is, I am really looking for a local resolution filtered map to aid further work.

I’ll fall back to LAFTER for now I guess.

edit: Here’s the error if it’s of use.

Traceback (most recent call last):
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 851, in _attempt_allocation
    return allocator()
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 116, in cryosparc_master.cryosparc_compute.run.main
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 243, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.run_locfilter
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 292, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.standalone_locfilter
  File "cryosparc_master/cryosparc_compute/jobs/local_filter/run.py", line 361, in cryosparc_master.cryosparc_compute.jobs.local_filter.run.standalone_locfilter
  File "/home/cryosparcer/bin/cryosparc_worker/cryosparc_compute/gpu/gpuarray.py", line 174, in conj
    result = conjugate(self, out=out, stream=self.stream)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/vectorizers.py", line 28, in __call__
    return CUDAUFuncMechanism.call(self.functions, args, kws)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/np/ufunc/deviceufunc.py", line 298, in call
    devout = cr.allocate_device_array(shape, resty, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/vectorizers.py", line 173, in allocate_device_array
    return cuda.device_array(shape=shape, dtype=dtype, stream=stream)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devices.py", line 232, in _require_cuda_context
    return fn(*args, **kws)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/api.py", line 144, in device_array
    return devicearray.DeviceNDArray(shape=shape, strides=strides, dtype=dtype,
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/devicearray.py", line 103, in __init__
    gpu_data = devices.get_context().memalloc(self.alloc_size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1372, in memalloc
    return self.memory_manager.memalloc(bytesize)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1056, in memalloc
    ptr = self._attempt_allocation(allocator)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 863, in _attempt_allocation
    return allocator()
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 1054, in allocator
    return driver.cuMemAlloc(size)
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 348, in safe_cuda_api_call
    return self._check_cuda_python_error(fname, libfn(*args))
  File "/home/cryosparcer/bin/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 408, in _check_cuda_python_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [CUresult.CUDA_ERROR_OUT_OF_MEMORY] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY