I recently upgraded one of our cryosparc instances from 4.7 to 5.0.1 along with our Nvidia driver from 525.78.01 to 580.126.09. Since the upgrade, a couple of my users have had jobs fail with the Call to cuMemHostAlloc results in CUDA_ERROR_INVALID_VALUE error. I put the export CRYOSPARC_NO_PAGELOCK=true line in the worker config and one of the users’ job completed without error but not the other. I’ve also disabled transparent hugepages with no luck. Is there something else that I might be missing here?
Thanks @ftilley for the report. We identified a bug that affects “single workstation” and cluster worker CryoSPARC setups, where cryosparc_worker/config.sh settings may be (incrorrectly) ignored. We plan to release a fix.
Until the fix is released, the bug’s effect can be mitigated
for slurm cluster lanes: by including the #SBATCH --export=NONEoption in the lane’s script template
for worker targets that are on the same computer as the CryoSPARC master: by including the export CRYOSPARC_NO_PAGELOCK=true line in the cryosparc_master/config.sh file
CryoSPARC v5.0.2 has been released and includes the clearing of master environment variables. The #SBATCH --export=NONE slurm option we proposed earlier is