Cache Particles on SSD fail

Hello Everyone,

I am queuing a series of heterogeneous refinement jobs, I have four GPU, the first three jobs were running well, but the forth failed with

Traceback (most recent call last):
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 104, in func
    with make_json_request(self, "/api", data=data, _stacklevel=4) as request:
  File "/home/cryosparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 225, in make_request
    raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc_tools.cryosparc.errors.CommandError: *** (http://localhost:39002/api, code 500) Timeout Error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 95, in cryosparc_master.cryosparc_compute.run.main
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_compute/jobs/utilities/run_cache_particles.py", line 31, in run
    particles.read_blobs(proj_dir_abs, do_cache=True)
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_compute/particles.py", line 120, in read_blobs
    u_blob_paths = cache_run(u_rel_paths)
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py", line 112, in download_and_return_cache_paths
    rc.cli.cache_sync_in_use(worker_hostname, rc._project_uid, rc._job_uid)  # ignore self
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 107, in func
    raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://localhost:39002, code 500) Encounted error from JSONRPC function "cache_sync_in_use" with params ('localhost', 'P8', 'J2570')

after ten minutes running,

I noticed the error in “cache_sync_in_use” as it prints, so i ran a “Cache Particles on SSD” job with same particles, and it failed as the same

Traceback (most recent call last):
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 104, in func
    with make_json_request(self, "/api", data=data, _stacklevel=4) as request:
  File "/home/cryosparc/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 225, in make_request
    raise CommandError(error_reason, url=url, code=code, data=resdata)
cryosparc_tools.cryosparc.errors.CommandError: *** (http://localhost:39002/api, code 500) Timeout Error

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "cryosparc_master/cryosparc_compute/run.py", line 95, in cryosparc_master.cryosparc_compute.run.main
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_compute/jobs/utilities/run_cache_particles.py", line 31, in run
    particles.read_blobs(proj_dir_abs, do_cache=True)
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_compute/particles.py", line 120, in read_blobs
    u_blob_paths = cache_run(u_rel_paths)
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py", line 112, in download_and_return_cache_paths
    rc.cli.cache_sync_in_use(worker_hostname, rc._project_uid, rc._job_uid)  # ignore self
  File "/home/cryosparc/cryosparc/cryosparc_worker/cryosparc_tools/cryosparc/command.py", line 107, in func
    raise CommandError(
cryosparc_tools.cryosparc.errors.CommandError: *** (http://localhost:39002, code 500) Encounted error from JSONRPC function "cache_sync_in_use" with params ('localhost', 'P8', 'J2570')

I want to know what is going on, any help is grateful.

You may want to try the new cache system, which may help avoid this error.
To enable the new caching system, you can add the line

export CRYOSPARC_IMPROVED_SSD_CACHE=true

to the file

/home/cryosparc/cryosparc/cryosparc_worker/config.sh

and to the config.sh file of any other cryosparc_worker/ installation that may be in use on this CryoSPARC instance.
Does this help?

Thanks for your relpy,

Just right after I reported this problem, our workstation crashed.

And after we made a reboot, the problem has gone.

So I am not sure if this problem is resulted from the caching system or a precursor to the crash.

I will try your advice If I ever encounter this problem again.

Thank you ~~~