For some reason I’ve been getting the following when running Cryosparc in two different workstations:
Up until this point, all jobs run smoothly. Then, suddenly, jobs crash and fail. I see that the temp folder has been filled up. After clearing some space, the failure remains and I cannot run any job anymore.
For instance, this is the error I’m getting now:
Traceback (most recent call last):
File "cryosparc_master/cryosparc_compute/run.py", line 96, in cryosparc_compute.run.main
File "cryosparc_master/cryosparc_compute/jobs/local_refine/newrun.py", line 401, in cryosparc_compute.jobs.local_refine.newrun.run_local_refine
File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2877, in cryosparc_compute.engine.newengine.get_initial_noise_estimate
File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 2897, in cryosparc_compute.engine.newengine.get_initial_noise_estimate
File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 538, in cryosparc_compute.engine.newengine.EngineThread.preprocess_image_data
File "cryosparc_master/cryosparc_compute/engine/newengine.py", line 532, in cryosparc_compute.engine.newengine.EngineThread.preprocess_image_data
File "cryosparc_master/cryosparc_compute/engine/newgfourier.py", line 22, in cryosparc_compute.engine.newgfourier.get_plan_R2C_2D
File "/data/loewith/tafurpet/software/cryosparc/cryosparc_worker/cryosparc_compute/skcuda_internal/fft.py", line 115, in __init__
self.handle = gpufft.gpufft_get_plan(
RuntimeError: cuFFT failure: cufftSetStream(plan_cache.plans[idx].handle, device_stream)
-> CUFFT_INVALID_PLAN
Again, everything runs normally up until this point.
Yes, all jobs that require GPU fail with the same error
Not sure about that, but don’t think so
What has solved the issue is restarting the workstation and/or reinstalling cryoSPARC, but it is strange as other users have had the same issue randomly (happened without any change to the workstation).
Could there still have been any automated system or driver software updates?
Perhaps; it appears that it is happening to different users (with different instances of cryoSPARC) in the same workstation when they start to use cryoSPARC again to process after a while.