Unsuccessful clearing of SSD cache

Scrounger · July 5, 2024, 3:02pm

Hi,

I have been experiencing issues clearing the SSD cache, and keep getting the following error:

RuntimeError: SSD cache needs additional x TiB but drive can only be filled to y TiB.

I have tried all suggested fixes in the troubleshooting guide (Troubleshooting | CryoSPARC Guide), but none of them seem to do the trick in this case.

I have had this problem in previous versions as well, and have recently updated to v4.5.3.

Would anyone be able to point me in the right direction?

wtempel · July 5, 2024, 3:53pm

@Scrounger Please can you post the output of the command

cryosparcm cli "get_scheduler_targets()"

and the hostname of the worker on which the job is failing?

Scrounger · July 10, 2024, 4:04pm

@wtempel I get the following output:

[{‘cache_path’: ‘/tmp’, ‘cache_quota_mb’: None, ‘cache_reserve_mb’: 10000, ‘desc’: None, ‘gpus’: [{‘id’: 0, ‘mem’: 25266028544, ‘name’: ‘NVIDIA GeForce RTX 4090’}, {‘id’: 1, ‘mem’: 25289621504, ‘name’: ‘NVIDIA GeForce RTX 4090’}], ‘hostname’: ‘localhost’, ‘lane’: ‘default’, ‘monitor_port’: None, ‘name’: ‘localhost’, ‘resource_fixed’: {‘SSD’: True}, ‘resource_slots’: {‘CPU’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], ‘GPU’: [0, 1], ‘RAM’: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]}, ‘ssh_str’: ‘cryosparc@localhost’, ‘title’: ‘Worker node localhost’, ‘type’: ‘node’, ‘worker_bin_path’: ‘/opt/cryosparc/worker/bin/cryosparcw’}]

The hostname ought to be “localhost”.

wtempel · July 10, 2024, 5:36pm

What were x and y?
What is the output of the command
df -h /tmp
?

Scrounger · July 10, 2024, 5:55pm

Not currently at my workstation, but I remember the following:

x = 2.2TiB, y = 1.8TiB

Du -h for instance_localhost:61001 in /tmp gave 0, since I previously deleted it.

Is there any other directory I should look out for?

wtempel · July 10, 2024, 5:58pm

Thanks for posting x and y. How about the output of

df -h /tmp

It is likely that the filesystem size and its use for purposes other than CryoSPARC particle caching prevent caching of the particle set.

Scrounger · July 10, 2024, 7:40pm

Managed to get back to my workstation!

$ du -h /tmp 
> 7.8M /tmp

Overall there seems to be 1.4TiB available on the drive.

wtempel · July 10, 2024, 7:49pm

There seems to be a confusion between the du and df commands. It would be interesting to the output of the latter.

Scrounger · July 10, 2024, 8:01pm

Sorry, my mistake.

$ df -h /tmp

Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/almalinux_cmm1016-root  1.9T  516G  1.4T  28% /

wtempel · July 10, 2024, 8:21pm

May be the filesystem is just not large enough to cache all needed particles for the job?

Next time you observe this error, you may want to take note of the actual numbers, as well as the output of the commands

df -h /tmp
du -sh /tmp/instance_localhost\:61001/

Two tips that may help reduce the cache capacity needed:

saving results in 16 bit float format (details)
in case particle stacks still hold many deselected particles: Restack Particles (where you can also save 16bit float results)

Scrounger · July 11, 2024, 8:34am

Restacking worked like a charm, thanks!