Best Practices / Configs for jobs using NFS mounts on workers

wmatthews · May 5, 2025, 5:28pm

We have CryoSPARC installed on a GPU cluster with five 4-way A100 Dell 8640s as the workers. The nodes are connected to an enterprise SAN Isilon over 10Gbps Ethernet. Jobs are running 1-2x slower than we expect they should. No immediate bottlenecks found on cluster, routers, or storage environment.

Is there a “best way” to connect NFS mounts to workers other than NFSv3? What about suggested MTU settings on routers between workers and storage?

Mark-A-Nakasone · May 6, 2025, 12:45pm

NFSv3 is usually pretty good and much faster than SMB/samba.

Do you have access to the 10Gbps switch or is it all under the DELL stuff ?

In your SAN setup, is it all SSD or is there a scratch setup ? If there is SSD/scratch how fast are particles being written to it ? This could give you an idea of the speeds e.g. 200-400 MB/s would be on the slower side of what I expect.

You can also adjust your /cryosparc_master/config.sh add the line


export CRYOSPARC_CACHE_NUM_THREADS=6

maybe 6, 8, 12, 16, 24 if you have the cores, this could help in some cases.

wmatthews · May 6, 2025, 7:37pm

Thank you for the response. I only have metric data and graphs from the 10G switch unfortunately. As for the worker nodes, they have NvME local scratch. When I installed CryoSPARC, I used the “–no-ssd” option to make it work with NvME. I have added the ‘export’ to the config = 16 and will see how that works out.

Do you know if there are MTUs on routers/switches that need to be set to a specific value (jumbo frames)?

Thank you again!

wtempel · May 6, 2025, 9:36pm

@wmatthews Are you referring to the
cryosparc_worker/bin/cryosparcw connect --nossd option? This option would usually disable particle caching unless caching is configured in some other way, such as with the CRYOSPARC_SSD_PATH variable.

wmatthews · May 8, 2025, 2:49pm

Yes, when I installed the workers, I used the: cryosparcw connect --worker <worker_node> --master <master_node> --nossd option because the software wouldn’t install and wouldn’t recognize the NvME drives as ‘ssd’ on the worker nodes.

The user has used the CryoSPARC software in this configuration with no issues, particularly when the data was coming from a local Isilon. Now we’re feeding the data from an enterprise NFS (same network speed) and the jobs run 1-2x slower.

Is there a configuration in the CryoSPARC software I could check? We have our networks team reviewing graphs for network issues but there don’t seem to be any.

Thank you!

wtempel · May 8, 2025, 3:32pm

If you have available storage that is significantly faster than the project directory storage you may re-run the cryosparcw connect command on each worker with the following changes:

add the --update option
remove the --nossd option
add the --ssdpath /path/to/cache option, where /path/to/cache is a directory on the faster storage

If /path/to/cache is on shared storage, check which CRYOSPARC_CACHE_LOCK_STRATEGY applies to your case and should be configured in cryosparc_worker/config.sh.