Slurm cluster cache_files hostname

CD1 · August 8, 2023, 10:16am

Hi

We are running cryoSPARC on a Slurm cluster. We added a single lane that points to a cluster partition which contains 3 nodes at the moment. Each node is using its own local SSD cache.

While investigating some file caching issues, I came across an interesting (and potentially problematic) fact when I was looking at the database table containing the entries for cached files (cache_files). Each entry contains a “hostname” key, however, this host name is the same for all entries (the name of the lane), irrespective of the compute node these files were cached on. I am now wondering, if cryoSPARC will actually be able do delete unused cache files from node01 if they are currently being used by a job on node02?

Is there a way to invoke the host name when starting the job so the cache files get unique host names assigned and not just the name of the lane?

Alternatively, I would need to define a separate lane for each node, correct?

Best and thanks,
Chris

Andrea · August 9, 2023, 11:46am

Hi,

We also have a SLURM installation with multiple nodes, and an SSD on each node (same mount point on all nodes). We have no issue with caching, with some minor caveats listed below.

When caching, Cryosparc will create a folder (instance_hostname:port) on the SSD, and keep on caching files until it reaches a specified quota (defined in cluster_info.json).
Once the quota is reached, if files need to be cached, cryosparc will simply delete old, currently unused cache until enough space is freed for the new caching operation.
Due to the unique ids attached to every cryosparc file, there is no risk of collision between users, even if multiple users are computing on the same node.

The small caveats:
1 - cryosparc will not delete its cache after it is done (which speeds up jobs enormously!), but unless it is given a quota, it might fill the SSD. This might be an issue if other software wants to run on the same cluster (e.g., relion). We set a quota of 1 Tb in cluster_info.json, which is fine for most situations, and leaves enough of the SSD for other uses. Your cluster admin might have things to say about this setting…

2 - If the files to be cached are bigger than the available quota on the assigned node, cryosparc will “hang” and wait for the chance to delete cached data that might be in use by other jobs / users;
of course, if the total size of the data to be transferred is bigger than the total quota in cluster_info.json, you won’t be able to run the job… so pay attention.

3 - if multiple jobs using the same particle stack are being sent to different nodes for different jobs, cryosparc will start transfer to node1, and lock the cache until the transfer is finished; only then wll cryosparc cache to node2. It is worth paying attention to this sometimes to avoid (infrequent and circumstantial) delays.

CD1 · August 9, 2023, 12:35pm

Hi Andrea

Thanks for the reply. Its good to hear that you are running such setup without issues. Are you using a single lane that points to a partition with several nodes? How long is your cached file life time?

What unique id are you referring to? When I check what is saved as “key” for each item in the cache_file database, the files do not get a unique ID. Additionally, the “instance_hostname:port” part of the cache path refers to the hostname of the instance that users start the jobs from (frontend). In our case, all users use the same frontend, so this is the same for all cached files on all nodes.

An example database entry for a cached file looks like:

{
    "_id" : ObjectId("644a84b5693c1ac26d08e2df"),
    "hostname" : "slurm_cluster",
    "key" : "instance_s1160-slurmb.*****:39001/imports/media/snfs2/P7/Shares/path/to/particles/stack.mrcs",
    "status" : "miss",
    "in_use_by" : [],
    "last_requested" : ISODate("2023-04-28T13:43:50.202Z"),
    "size_mb" : 11.3759765625
}

Where hostname is the name of the lane and key refers to the path on the cache SSD. The path is the same as on the nodes file system except for the instance_s1160-slurmb.*****:39001/imports prefix. The _id is just an internal database id.

I still don’t see how cryoSPARC would be able to distinguish cached files on node01 and node02 in our setup. What am I missing?

Best,
Chris

Andrea · August 9, 2023, 1:04pm

We have a single lane pointing to 8 nodes, with each node having the SSD mounted at /processing.

I do not know the details behind the scenes, but cryosparc makes its own folder there named something like

/processing/instance_hodgkin:39001/projects/P234/J194
or
/processing/instance_hodgkin:39001/imports/[...]/P234/J115/extract

(our cryosparc runs on hodgkin:39000)

and transfers whatever file it needs, e.g.

Transferring J194/extract/015888215158151575931_01278_patch_aligned_doseweighted_particles.mrc (3 MB) (380/433)

The file name carries its own unique id (015888215158151575931). Also, each project and job gets its own subfolder, so project + job ownership is clear.

Cached lifetime seems to be “until space is needed and deletion is triggered by cryosparc” or “until someone uses rm -rf on the folder”

I do not know precisely how cryosparc keeps track of which files have been transferred to which node, but one can simply search if a file with “Object_id” is present anywhere in the expected folder (“key”) having the appropriate “size_mb”.