Installation of v4.4 with existing v3.3 on HPC

bfisher · November 13, 2023, 1:19pm

Hi all,

Very excited about being able to use all the new features of v4.4, thank you to everyone involved in developing and testing it!

We currently have a working version of v3.3 (I know, long in need of an upgrade…) that we would like to keep for now until we test v4.4 on our HPC and gradually transition to v4.4. Essentially, we are hoping to do a fresh install of v4.4 with a new master node on a new disk, with worker nodes shared between v3.3 and v4.4 across nodes on our SLURM HPC but not used simultaneously to run jobs from different versions. I have a few questions about this that I would be grateful if anyone can give me some pointers towards?

Can we (in academia) get a separate license for a separate instance of v4.4 in addition to the one we have for running v3.3?
Will it be possible to update SLURM nodes with the latest nvidia drivers and still run v3.3 in addition to v4.4 but not allocated with jobs simultaneously?
Can you envisage incompatibility issues for CUDA libraries for things like e.g. tensorflow by doing this?

All the best and many thanks,

Billy

wtempel · November 13, 2023, 9:01pm

You may request an additional license id.

I am not sure. You may want to test whether jobs of your v3.3 instance can run on a worker that has nvidia-driver version ≥ 520. Is my suspicion correct that your v3.3 cryosparc_worker/ software is linked to a toolkit version < 11.8?

cryosparcw call nvcc -V

? During a test just now I failed to run a job on CryoSPARC v3.3.1 when cryosparc_worker/ was linked to CUDA 11.8.

This is closely related to your previous question and hinges on whether the CUDA toolkit currently linked to the v3.3 cryosparc_worker/ installation is compatible with the nvidia driver version ≥ 520. If it is, your plan of running CryoSPARC v4.4 alongside v3.3 (subject to non-overlapping port ranges if installed on the same master host) might work, as v4.4 should run independently from the CUDA toolkit that is used by the v3.3 cryosparc_worker/ installation.

bfisher · November 14, 2023, 1:13pm

Hi,

Thanks for letting me know about the license.

Yes, the current cryosparc_worker for v3.3 is linked to CUDA 10 I think - thanks for running the test on CUDA 11.8 - it’s good to know that I will need separate CUDA libraries to do this.

I think the simpler plan for now might be to have a new node for a new cryosparc_master for v4.4, then sacrifice a couple of pascals for the v4.4 cryosparc_worker nodes for the time being (with updated NVIDIA drivers) and, if all seems to work fine, move over the other pascals to v4.4 worker nodes in time.

Does this sound like a reasonable solution to avoid some of these issues?

wtempel · November 14, 2023, 2:38pm

I should mention that I ran this test on Ubuntu-22.04, which is a solid platform for running current versions of CryoSPARC, but could have played a role in the failure with the older CryoSPARC version. Other users on the forum ran into issues installing older CryoSPARC versions on Ubuntu-22.04: Worker connect does not work during installation? - #12 by ZhijieLi.

Sounds reasonable to me.

bfisher · November 17, 2023, 11:38am

Thanks Wolfram, I’ll give this a go shortly and let you know the outcome!