Unsuccessful clearing of SSD cache


I have been experiencing issues clearing the SSD cache, and keep getting the following error:

RuntimeError: SSD cache needs additional x TiB but drive can only be filled to y TiB.

I have tried all suggested fixes in the troubleshooting guide (Troubleshooting | CryoSPARC Guide), but none of them seem to do the trick in this case.

I have had this problem in previous versions as well, and have recently updated to v4.5.3.

Would anyone be able to point me in the right direction?

@Scrounger Please can you post the output of the command

cryosparcm cli "get_scheduler_targets()"

and the hostname of the worker on which the job is failing?

@wtempel I get the following output:

[{‘cache_path’: ‘/tmp’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25266028544, ‘name’: ‘NVIDIA GeForce RTX 4090’}, {‘id’: 1, ‘mem’: 25289621504, ‘name’: ‘NVIDIA GeForce RTX 4090’}], ‘hostname’: ‘localhost’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘localhost’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, ‘ssh_str’: ‘cryosparc@localhost’, ‘title’: ‘Worker node localhost’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/opt/cryosparc/worker/bin/cryosparcw’}]

The hostname ought to be “localhost”.

What were x and y?
What is the output of the command
df -h /tmp

Not currently at my workstation, but I remember the following:

x = 2.2TiB, y = 1.8TiB

Du -h for instance_localhost:61001 in /tmp gave 0, since I previously deleted it.

Is there any other directory I should look out for?

Thanks for posting x and y. How about the output of

df -h /tmp

It is likely that the filesystem size and its use for purposes other than CryoSPARC particle caching prevent caching of the particle set.

Managed to get back to my workstation!

$ du -h /tmp 
> 7.8M /tmp

Overall there seems to be 1.4TiB available on the drive.

There seems to be a confusion between the du and df commands. It would be interesting to the output of the latter.

Sorry, my mistake.

$ df -h /tmp

Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/almalinux_cmm1016-root  1.9T  516G  1.4T  28% /

May be the filesystem is just not large enough to cache all needed particles for the job?

Next time you observe this error, you may want to take note of the actual numbers, as well as the output of the commands

df -h /tmp
du -sh /tmp/instance_localhost\:61001/

Two tips that may help reduce the cache capacity needed:

  • saving results in 16 bit float format (details)
  • in case particle stacks still hold many deselected particles: Restack Particles (where you can also save 16bit float results)
1 Like

Restacking worked like a charm, thanks!