V3.0.1 cuMemHostAlloc problems and Abinitio stops without error

What gpus are you using? Do you have anything that monitors the worker host collecting the system stats?

I’ve been having this issue since v2.15 with v100 32gig cards, the gpu memory never goes about 10gigs yet jobs will randomly fail with cuMemAlloc error. I’ve tried various nvidia drivers and cuda versions, with no noticeable differences.