cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE (version 5.0.1)

Hi

I recently upgraded one of our cryosparc instances from 4.7 to 5.0.1 along with our Nvidia driver from 525.78.01 to 580.126.09. Since the upgrade, a couple of my users have had jobs fail with the Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE error. I put the export CRYOSPARC_NO_PAGELOCK=true line in the worker config and one of the users’ job completed without error but not the other. I’ve also disabled transparent hugepages with no luck. Is there something else that I might be missing here?

thanks

frank

Thanks @ftilley for the report. We identified a bug that affects “single workstation” and cluster worker CryoSPARC setups, where cryosparc_worker/config.sh settings may be (incrorrectly) ignored. We plan to release a fix.
Until the fix is released, the bug’s effect can be mitigated

  • for slurm cluster lanes: by including the
    #SBATCH --export=NONE option in the lane’s script template
  • for worker targets that are on the same computer as the CryoSPARC master: by including the
    export CRYOSPARC_NO_PAGELOCK=true line in the cryosparc_master/config.sh file

CryoSPARC v5.0.2 has been released and includes the clearing of master environment variables. The
#SBATCH --export=NONE slurm option we proposed earlier is

  • an alternative workaround to this issue
  • also compatible with CryoSPARC v5.0.2.

Thank you so much for the workarounds and this update. The workaround got folks working again and I’ll see about upgrade to 5.0.2 as soon as I can

frank