AssertionError: SSD cache needs more space

Dear All,

Since updating to version 4.1.1 we are experiencing an error that particles cannot be copied to the SSD despite it being empty and having 2TB free. The error message is below:

Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 93, in cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/class2D/run.py”, line 63, in cryosparc_compute.jobs.class2D.run.run_class_2D
File “/cryosparc/cryosparc_installation/cryosparc_worker/cryosparc_compute/particles.py”, line 114, in read_blobs
u_blob_paths = cache.download_and_return_cache_paths(u_rel_paths)
File “/cryosparc/cryosparc_installation/cryosparc_worker/cryosparc_compute/jobs/cache.py”, line 153, in download_and_return_cache_paths
assert need_mb <= total_mb, (
AssertionError: SSD cache needs 239429MB but drive can only be filled up to 61645MB; please disable SSD cache for this job.

I can confirm this is not an issue of straining the master node or the network. The ssdquota on the workers is also set to be 1.8TB (a little less than the full capacity). We have also tried re-installing from scratch which unfortunately didn’t resolve the issue. Any help will be much appreciated!

Best wishes,
Daniel

1 Like

@dmihaylov Please can you post the output of
cryosparcm cli "get_scheduler_targets()"

Hi i have encounter the similar Error recently,the log and result of “cryosparcm cli “get_scheduler_targets()”” is below, any instructions?

Blockquote
[CPU: 936.4 MB Avail: 497.75 GB]
Traceback (most recent call last):
File “cryosparc_master/cryosparc_compute/run.py”, line 96, in cryosparc_compute.run.main
File “cryosparc_master/cryosparc_compute/jobs/class2D/run.py”, line 63, in cryosparc_compute.jobs.class2D.run.run_class_2D
File “/home/yuexin/cryosparc/cryosparc_worker/cryosparc_compute/particles.py”, line 114, in read_blobs
u_blob_paths = cache.download_and_return_cache_paths(u_rel_paths)
File “/home/yuexin/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py”, line 115, in download_and_return_cache_paths
delete_cache_files(instance_id, worker_hostname, ssd_cache_path, cache_reserve_mb, cache_quota_mb, used_mb, need_mb)
File “/home/yuexin/cryosparc/cryosparc_worker/cryosparc_compute/jobs/cache.py”, line 317, in delete_cache_files
assert need_mb <= total_mb, (
AssertionError: SSD cache needs 2702504MB but drive can only be filled up to 1819504MB; please disable SSD cache for this job.

Blockquote
cryosparcm cli “get_scheduler_targets()”
[{‘cache_path’: ‘/gpu_temp’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25446907904, ‘name’: ‘NVIDIA GeForce RTX 3090’}, {‘id’: 1, ‘mem’: 25446776832, ‘name’: ‘NVIDIA GeForce RTX 3090’}, {‘id’: 2, ‘mem’: 25446907904, ‘name’: ‘NVIDIA GeForce RTX 3090’}, {‘id’: 3, ‘mem’: 25446907904, ‘name’: ‘NVIDIA GeForce RTX 3090’}], ‘hostname’: ‘localhost’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘localhost’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], ‘GPU’: [0, 1, 2, 3], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63]}, ‘ssh_str’: ‘yuexin@localhost’, ‘title’: ‘Worker node localhost’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/home/yuexin/cryosparc/cryosparc_worker/bin/cryosparcw’}]

The message indicates that ≈ 2.7 TB of particle stacks would need to be cached for the job, which exceeds the maximum of ≈ 1.8 TB that may become available on your cache device.
What is the actual capacity of your ssd cache device?
If the required cache capacity of a job exceeds the total capacity of the cache device, options are to

  • re-try the job with caching disabled
  • process non-overlapping subsets of particles in separate jobs, thereby potentially reducing the required cache capacity for each job.
  • increase the available cache capacity, for example via a hardware upgrade or reconfiguration

I see, Thanks for your reply, I canceled the “copy image to SSD” and no ERROR appears, Now i am going to upgrade a larger SSD.