CLUSTER
Hello Everyone! I am new to Cryosparc as a linux systems administrator (primarily on a HPC cluster). I have installed and configured the cryosparc_master and cryosparc_worker packages on my development environment and everything seems to be working correctly. However, I wanted to make some configuration changes and see if I am doing this correctly (specifically for HPC).
- Where should master be running?
- Should my “master” process be running on a head node for the cluster?
- Or a standalone server that only does the “master” process?
- Mulit-user HPC Environment?
- I installed and configured as
cryosparcuser
however, when Cryosparc launches jobs, we would like the job to be submitted under the user who submitted (given that the web ui username == server username). - This is so we can track usage and have our scheduler (SLURM) utilize fairshare, QOS, and user tracking.
- I have attempted this by updating the cluster_info.json file to contain a
su <username> -c 'sbatch {{ script_path_abs }}'
- This does not seem to play nicely (probably due to permissions)
- What is the recommended practice here?
- I installed and configured as
- How to open Cryosparc at the user’s working directory?
- Should users be launching cryosparcm themselves (each with own LICENSE_ID)
- Or should the local
cryosparcuser
have the master process running at all times
Forgive me for any stupid questions, I have only started this project a few days ago and possibly could have missed this in the documentation.
See below for template information:
$ uname -a
Linux wi-hpc-hn-dev01 4.18.0-372.9.1.el8.x86_64 #1 SMP Tue May 10 14:48:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ free -g
total used free shared buff/cache available
Mem: 15 2 3 0 9 11
Swap: 15 0 15
$ eval $(/applications/cryosparc/cryosparc_worker/bin/cryosparcw env)
$ env | grep PATH
LD_LIBRARY_PATH=/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64:/cm/local/apps/gcc/11.2.0/lib:/cm/local/apps/gcc/11.2.0/lib64:/cm/local/apps/gcc/11.2.0/lib32
__LMOD_REF_COUNT_PATH=/cm/shared/apps/slurm/current/sbin:1;/cm/shared/apps/slurm/current/bin:1;/cm/local/apps/gcc/11.2.0/bin:1;/applications/cryosparc/cryosparc_master/bin:1;/home/cryosparcuser/.local/bin:1;/home/cryosparcuser/bin:1;/usr/local/bin:1;/usr/bin:1;/usr/local/sbin:1;/usr/sbin:1;/opt/dell/srvadmin/bin:1
__LMOD_SET_FPATH=1
FPATH=/usr/share/lmod/lmod/init/ksh_funcs
__LMOD_REF_COUNT_LD_LIBRARY_PATH=/cm/shared/apps/slurm/current/lib64/slurm:1;/cm/shared/apps/slurm/current/lib64:1;/cm/local/apps/gcc/11.2.0/lib:1;/cm/local/apps/gcc/11.2.0/lib64:1;/cm/local/apps/gcc/11.2.0/lib32:1
__LMOD_REF_COUNT_MODULEPATH=/cm/local/modulefiles:1;/etc/modulefiles:1;/usr/share/modulefiles:1;/usr/share/Modules/modulefiles:1;/cm/shared/modulefiles:2
CRYOSPARC_PATH=/applications/cryosparc/cryosparc_worker/bin
CPATH=/cm/shared/apps/slurm/current/include
__LMOD_REF_COUNT_LIBRARY_PATH=/cm/shared/apps/slurm/current/lib64/slurm:1;/cm/shared/apps/slurm/current/lib64:1
LIBRARY_PATH=/cm/shared/apps/slurm/current/lib64/slurm:/cm/shared/apps/slurm/current/lib64
__LMOD_REF_COUNT_MANPATH=/cm/shared/apps/slurm/current/man:1;/usr/share/lmod/lmod/share/man:1;/usr/local/share/man:1;/usr/share/man:1;/cm/local/apps/environment-modules/current/share/man:1
__LMOD_REF_COUNT_CPATH=/cm/shared/apps/slurm/current/include:1
PYTHONPATH=/applications/cryosparc/cryosparc_worker
MANPATH=/cm/shared/apps/slurm/current/man:/usr/share/lmod/lmod/share/man:/usr/local/share/man:/usr/share/man:/cm/local/apps/environment-modules/current/share/man
MODULEPATH=/cm/local/modulefiles:/etc/modulefiles:/usr/share/modulefiles:/usr/share/Modules/modulefiles:/cm/shared/modulefiles
MODULEPATH_ROOT=/usr/share/modulefiles
NUMBA_CUDA_INCLUDE_PATH=/applications/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/include
PATH=/applications/cryosparc/cryosparc_worker/bin:/applications/cryosparc/cryosparc_worker/deps/anaconda/envs/cryosparc_worker_env/bin:/applications/cryosparc/cryosparc_worker/deps/anaconda/condabin:/cm/shared/apps/slurm/current/sbin:/cm/shared/apps/slurm/current/bin:/cm/local/apps/gcc/11.2.0/bin:/applications/cryosparc/cryosparc_master/bin:/home/cryosparcuser/.local/bin:/home/cryosparcuser/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/dell/srvadmin/bin
$ uname -a
Linux <hostname> 4.18.0-372.9.1.el8.x86_64 #1 SMP Tue May 10 14:48:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ free -g
total used free shared buff/cache available
Mem: 503 2 416 0 83 497
Swap: 15 0 15
$ nvidia-smi
Mon Sep 9 15:47:03 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A40 On | 00000000:17:00.0 Off | 0 |
| 0% 23C P8 23W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A40 On | 00000000:65:00.0 Off | 0 |
| 0% 23C P8 23W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A40 On | 00000000:CA:00.0 Off | 0 |
| 0% 23C P8 24W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A40 On | 00000000:E3:00.0 Off | 0 |
| 0% 23C P8 23W / 300W | 0MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+