Hi @adesgeorges,
Short answer: For now, to guarantee isolation each unix user on a cluster must run their own cryoSPARC instance. But everything is changing in the next major version, see below.
Long answer:
Good question.
Currently, the cryoSPARC processes need to be run by a user that has read/write access to the database as well as read access to all input and write access to all output data directories. On cluster systems this can of course be a pain, if unix user home directories or data dirs are isolated. In some cases it could be solved by having the database and input/output dirs be owned by a group that has read/write access, and then only add the unix users in your lab to that unix group, along with the unix user account that will be running the cryoSPARC system. In this case, users within the group would still be able to see/access eachother’s data, and each individual group on the cluster would have their own cryoSPARC installation, but would not be able to see eachothers data.
If you absolutely desire per-unix-user isolation, then things become more difficult. In this case, we cannot guarantee absolute security of the webapp or the database server, so we generally recommend against running cryoSPARC as root. Therefore, the only solution right now in order to have individual unix users on the cluster have absolute privacy over their data (both in the database and input/output directories) is for each unix user to be running their own cryosparc instance from their home directory. Then (assuming your home dir permissions are set) no one else can read each others data.
We know these are important issues facing larger labs and deployments using cryoSPARC, so we are working to address them in the next major cryoSPARC version (v2.0.0) which is a complete re-architecture and will support full pipeline processing (raw movies -> structure) as well as cluster installations (with some caveats).
We could actually use your input (and anyone else who is facing these issues) to make sure we are heading in the right direction. The requirements we have are:
- No root access required for installation on a cluster
- Jobs running on cluster nodes should be submitted to a scheduler (SLURM/PBS/etc)
- Jobs running on workstations or individual servers should be scheduled by cryoSPARC and run directly
- A single cryoSPARC instance should be able to run jobs on multiple machines/cluster(s) that have access to the same shared filesystem for project directories
- Unix user isolation should be possible on cluster systems where users do not have root access
Given these requirements, the plan for the next version is:
- A single “cryoSPARC master” instance (does not require GPU) is constantly running, serving the webapp, database, and control layers of cryoSPARC.
- multiple “cryoSPARC worker” nodes can be attached to the master (each requires GPU+SSD). These can be individual workstation/rackmount machines on the same network, or a cluster of nodes that are managed by a cluster scheduler. (note: the machine running master can also be a worker for itself, in the case of a single workstation installation)
- The master and worker nodes all must have access to at least a common file system where cryoSPARC project and import directories are found (these could be in user home directories, an NFS share, SSHFS mounts, etc.)
- The master node has an internal queueing system that schedules cryoSPARC jobs that will be launched either on connected worker nodes or a connected worker cluster (user can write a job submission script template for their cluster).
In this setup, a standard cluster install would be to run cryoSPARC master as a regular user (say ‘cryosparcuser’) with the database in the cryosparcuser home directory, then have every other user’s import and project data stored somewhere on the filesystem that the users and cryosparcuser can read+write. This can be done by file permissions or group permissions, but will probably mean that the users can read eachothers files. Then multiple cryoSPARC workers are installed on each workstation and on the cluster nodes, a job submission script template is written for the cluster, and then users can log in to the cryoSPARC master web interface and run jobs on any nodes available or the cluster.
For unix user isolation in the new setup, the solution would be the same as now, that each user must install their own cryoSPARC master instance in their own home directory (including database) and then that user can have import/project dirs in the user’s own home directory. The big difference though will be that the cryoSPARC master instances for each user can all be run on non-GPU nodes inside or outside the cluster (or even all on the same node, or on a login node), as long as cluster nodes can talk to the master node(s) over the network. Then each user can log in to the web interface for their own master instance, and all users can, in a completely isolated fashion, view and submit jobs to their own master instance, that will in turn be submitted to the cluster under their own user name.
The above solution is much much simpler than attempting to have the cryoSPARC master instance run as root or with some setuid privileges, because this has the potential to introduce serious security holes in a cluster system, and would also require root sysadmin access to install on clusters that are not owned by a lab.
It would be great to get your thoughts on the above, especially whether or not the new setup that we are building will serve your needs. One nice thing about this setup is that the same cryoSPARC master instance in a big lab will be able to manage projects and jobs for all the users, and submit them seamlessly to cluster nodes or workstations or rackmount servers as required from the same web interface.