Allow multiple jobs to copy to SSD cache

daniel.s.d.larsson · January 9, 2024, 2:45pm

I do processing on a super computer where each node has a volatile cache (data is deleted after each job has finished) and therefore each job has to synchronise the particle stack to the local SSD each time. I have noticed when I run multiple jobs in parallel (e.g. running homogeneous refinement on all 3D classes) that the first job starting will lock the particle stack and therefore all the other jobs will have to wait. The parallel file system and network is very good on the computer, so there is no performance reason to not allow all jobs synchronising at the same time. Having the jobs idling while waiting for the stack to be unlocked hurts me in two ways. It costs computational time (time is limited on the system) and decreases the average performance (jobs which are not using resources efficiently are killed).

Could you please (perhaps as a configurable option) allow multiple jobs synchronizing at the same time. And/or could you have an option to have dependencies so that jobs are not submitted until the previous job is done synchronising.

rbs_sci · January 10, 2024, 12:42am

What impact does disabling caching have? If the network/file system are that good, you might not need to cache at all (or, rather, the impact from not caching will be less harmful in your scenario than the one-by-one lock for each run…)

leetleyang · January 10, 2024, 1:13am

Not a solution to your problem per se, but maybe a workaround.

If said nodes are equipped with multiple GPUs, could you request the necessary resources under a single session such that as many GPUs as possible share a common volatile cache? Registering said node(s) as normal workers in cryoSPARC will instantly reduce the number of times data has to be cached. You’ll need to terminate the parental session manually though.

I’ve done this via SLURM on a HPC cluster, using a combination of ssh hostname aliases to manage the redirection and a simple script that spawns cryosparcw connect --update to refresh the worker configuration in the database, and it works relatively painlessly.

Cheers,
Yang

wtempel · January 12, 2024, 5:31pm

If you have not already, you may want to try a new cache implementation available in CryoSPARC v4.4+. It can be enabled by defining, inside cryosparc_worker/config.sh,

export CRYOSPARC_IMPROVED_SSD_CACHE=true

This cache implementation permits certain cache transfers to occur in parallel.
An additional performance improvement may be achieved by defining, also in cryosparc_worker/config.sh,
a larger value for the CRYOSPARC_CACHE_NUM_THREADS variable, like

export CRYOSPARC_CACHE_NUM_THREADS=4

(documentation).

daniel.s.d.larsson · March 14, 2024, 9:58am

Yang: That sounds interesting. Could you share some snippets or scripts for how to achieve this? Could you perhaps even allocate multiple jobs to the same GPU (if memory allows)?

daniel.s.d.larsson · March 14, 2024, 10:00am

wtemple: Thank you for pointing out the CRYOSPARC_IMPROVED_SSD_CACHE flag. I did not know about it. What does it do? The support page is very vague? Why is it not enabled by default? Could it cause issues?

leetleyang · March 15, 2024, 4:57pm

Hi Daniel,

If you’re interested, the following assumes interaction through SLURM and, arbitrarily, a floating profile that loads on the cryosparc master node, login node and cluster node, but only for the scripting.

Requesting an interactive node from the scheduler.

<login_node> $ srun --nodes=1 --gres=gpu:4 --job-name=CS_interactive --pty bash
<cluster_node_a> $ hostname
cluster_node1.super_computer

Editing .ssh/config on the cryosparc_user account.

Host slurm_interactive_worker
HostName cluster_node1.super_computer
StrictHostKeyChecking no

Registering cluster_node_a as a worker.

<cluster_node_a> $ <path_to_cryosparc>/cryosparc_worker/bin/cryosparcw connect \
--worker slurm_interactive_worker \
--master <cryosparc_master_node> \
--port <port> \
--gpus 0,1,2,3 \
--ssdpath ${SLURM_SCRATCH_DIR} \
--newlane cluster_worker

That’ll get you started in the first instance. The next time, you may choose to request different resources:

Requesting a different interactive setup.

<login_node> $ srun --nodes=1 --gres=gpu:3 --job-name CS_Interactive --pty bash
<cluster_node_b> $ hostname
cluster_node2.super_computer

It’ll just be a matter of editing the hostname alias in the cryosparc_user account’s .ssh/config to point to cluster_node2.super_computer and updating the worker configuration.

Updating worker configuration.

<cluster_node_b> $ <path_to_cryosparc>/cryosparc_worker/bin/cryosparcw connect \
--worker slurm_interactive_worker \
--master <cryosparc_master_node> \
--port <port> \
--gpus 0,1,2 \
--ssdpath ${SLURM_SCRATCH_DIR} \
--update

…a process you could, in theory, script for the sake of convenience.

Example script:

#!/bin/bash
## VARIABLES
SSH_CONFIG="<path_to_cryosparc_user>/.ssh/config"
CURR_NODE=`grep -A1 'Host slurm_interactive_node' ${SSH_CONFIG} | awk '{print $NF}'`
THIS_NODE=`hostname`
NUM_GPUS=`nvidia-smi -L | awk '{printf "%d\n",$2}' | paste -s -d","`
## USER PROMPT
echo "LANE CLUSTER_WORKER"
echo "Current host  : ${CURR_NODE}"
echo "Host detected : ${THIS_NODE}"
echo "GPUs detected : ${NUM_GPUS}"
echo "SSD detected  : ${SLURM_SCRATCH_DIR}"
echo
echo "Update SSH and worker config?"

select yn in "Yes" "No"; do
	case $yn in 
		Yes ) 
			#UPDATE HOSTNAME ALIAS
			sed -i -r "/Host slurm_interactive_worker/{n;s,(HostName).*,\\1 $THIS_NODE,}" ${SSH_CONFIG}
			echo "Updated [slurm_interactive_worker] stanza:"
			grep -A1 'Host slurm_interactive_worker' ${SSH_CONFIG}
			#UPDATE GPU COUNT
			echo "Updating NODE configuration..."
			<path_to_cryosparc>/cryosparc_worker/bin/cryosparcw connect --worker slurm_interactive_worker --master <cryosparc_master_node> --port <port>--ssdpath ${SLURM_SCRATCH_DIR} --gpus ${NUM_GPUS} --update
			break
			;;
		No ) 
			exit 
			;;
		
	esac
done

…to run in subsequent interactive shells. Will require adapting for your specific setup.

Cheers,
Yang

wtempel · March 15, 2024, 5:40pm

The flag causes CryoSPARC to use new code for particle caching. This code was written taking into account feedback from our users regarding the older cache system. Now we would like to hear from users how the new cache systems performs under various computing setups and processing scenarios.
We are only aware of one issue so far, which is an occasional error on cache that is hosted on a distributed file system such as BeeGFS which has file-locking disabled. We are planning to address this in a future release. If your compute nodes have dedicated NVME SSD cache, you will not encounter this issue.