Multi user installation?

Tru · February 7, 2017, 3:22pm

Hi,

I have some questions about https://guide.cryosparc.com/install.html
for a multi user installation (ie I would rather avoid each user to install it’s own cryosparc version).
Is it possible?

Where do you want cryoSPARC to be installed? : can this be a readonly NFS mounted partitions?
Where do you want cryoSPARC to write output data?: it is writing as the user running the the web service, how does it match the different group members?
Where are your datasets stored? does this space have to be read-write? or can it be only read-only, caching through the SSD ?

Thanks

Tru

apunjani · February 7, 2017, 10:00pm

Hi Tru,

The design of cryoSPARC is certainly intended to support the multi-user use case directly. Our idea is that cryoSPARC would be installed directly on the GPU node where it is running, and all lab users would create accounts in the web interface supplied by that single instance of cryoSPARC. Then everyone can share the GPUs and the cryoSPARC scheduler will automatically queue and manage jobs from multiple users, and each user will be able to see their own results and experiments.

When you install cryosparc, the installation directory is where cryoSPARC will store its internal database. This should ideally be on a redundant, writable location but it doesn’t have to be local to the GPU node where jobs and cryoSPARC itself will be running. So you can use an NFS mount for this, though generally we recommend a local drive so that the database can be read and written quickly to minimize UI lag.
During configuration after installation, cryoSPARC will ask you where you want it to write its output data. This should ideally be a redundant, large storage location that’s writable, some people use a cluster filesystem for that. Currently, cryoSPARC will write those output files as the user running the web service, and cryoSPARC user accounts are not linked to linux/unix user accounts but this is something we’re looking in to.
Datasets are read from anywhere you like, configured through cryoSPARC’s ‘bulk directories’ (basically shortcuts to common locations that will be browsable in the cryoSPARC UI) that you can configure using the command cryosparc configure bulk add <path> <name>. The dataset locations only need to be readable by the user account running cryoSPARC, and datasets are transparently cached through the SSD (during configuration you specify a location for the cache).

Hope this helps,
Ali

ashpike · February 8, 2017, 12:52pm

On a related note - is it possible to assign ‘admin’ permissions to users? As admin I can see jobs from all users but basic users can only see their own jobs. We have a multi-user environment and would like users to be able to see results from all others on the same project.
Perhaps the way to go from our side may be to create common users (eg. projectname). Would the webapp be able to cope with multiple instances of the ‘same user’ being logged in and creating jobs etc (eg. same project but processing different datasets)?
Thanks
Ash

apunjani · February 8, 2017, 5:54pm

Hi Ashley,

Currently, if a regular non-admin user clicks the “person” icon under New Experiment:

then they will be able to see everyone’s jobs.

We’re intending to add more granular permissions control and admin settings going forward, so this is something we can look at in more detail.

Hope that helps,
Ali

adesgeorges · November 22, 2017, 10:52pm

Do I understand that the output directory and database need to be read/write accessible to all users? How can I have data for individual users only accessible to them? We are sharing the cluster with multiple groups and don’t want to give access to our data to others for confidentiality and security purposes.

Best,

Amedee

apunjani · November 25, 2017, 1:25am

Hi @adesgeorges,

Short answer: For now, to guarantee isolation each unix user on a cluster must run their own cryoSPARC instance. But everything is changing in the next major version, see below.

Long answer:
Good question.

Currently, the cryoSPARC processes need to be run by a user that has read/write access to the database as well as read access to all input and write access to all output data directories. On cluster systems this can of course be a pain, if unix user home directories or data dirs are isolated. In some cases it could be solved by having the database and input/output dirs be owned by a group that has read/write access, and then only add the unix users in your lab to that unix group, along with the unix user account that will be running the cryoSPARC system. In this case, users within the group would still be able to see/access eachother’s data, and each individual group on the cluster would have their own cryoSPARC installation, but would not be able to see eachothers data.

If you absolutely desire per-unix-user isolation, then things become more difficult. In this case, we cannot guarantee absolute security of the webapp or the database server, so we generally recommend against running cryoSPARC as root. Therefore, the only solution right now in order to have individual unix users on the cluster have absolute privacy over their data (both in the database and input/output directories) is for each unix user to be running their own cryosparc instance from their home directory. Then (assuming your home dir permissions are set) no one else can read each others data.

We know these are important issues facing larger labs and deployments using cryoSPARC, so we are working to address them in the next major cryoSPARC version (v2.0.0) which is a complete re-architecture and will support full pipeline processing (raw movies -> structure) as well as cluster installations (with some caveats).

We could actually use your input (and anyone else who is facing these issues) to make sure we are heading in the right direction. The requirements we have are:

No root access required for installation on a cluster
Jobs running on cluster nodes should be submitted to a scheduler (SLURM/PBS/etc)
Jobs running on workstations or individual servers should be scheduled by cryoSPARC and run directly
A single cryoSPARC instance should be able to run jobs on multiple machines/cluster(s) that have access to the same shared filesystem for project directories
Unix user isolation should be possible on cluster systems where users do not have root access

Given these requirements, the plan for the next version is:

A single “cryoSPARC master” instance (does not require GPU) is constantly running, serving the webapp, database, and control layers of cryoSPARC.
multiple “cryoSPARC worker” nodes can be attached to the master (each requires GPU+SSD). These can be individual workstation/rackmount machines on the same network, or a cluster of nodes that are managed by a cluster scheduler. (note: the machine running master can also be a worker for itself, in the case of a single workstation installation)
The master and worker nodes all must have access to at least a common file system where cryoSPARC project and import directories are found (these could be in user home directories, an NFS share, SSHFS mounts, etc.)
The master node has an internal queueing system that schedules cryoSPARC jobs that will be launched either on connected worker nodes or a connected worker cluster (user can write a job submission script template for their cluster).

In this setup, a standard cluster install would be to run cryoSPARC master as a regular user (say ‘cryosparcuser’) with the database in the cryosparcuser home directory, then have every other user’s import and project data stored somewhere on the filesystem that the users and cryosparcuser can read+write. This can be done by file permissions or group permissions, but will probably mean that the users can read eachothers files. Then multiple cryoSPARC workers are installed on each workstation and on the cluster nodes, a job submission script template is written for the cluster, and then users can log in to the cryoSPARC master web interface and run jobs on any nodes available or the cluster.

For unix user isolation in the new setup, the solution would be the same as now, that each user must install their own cryoSPARC master instance in their own home directory (including database) and then that user can have import/project dirs in the user’s own home directory. The big difference though will be that the cryoSPARC master instances for each user can all be run on non-GPU nodes inside or outside the cluster (or even all on the same node, or on a login node), as long as cluster nodes can talk to the master node(s) over the network. Then each user can log in to the web interface for their own master instance, and all users can, in a completely isolated fashion, view and submit jobs to their own master instance, that will in turn be submitted to the cluster under their own user name.
The above solution is much much simpler than attempting to have the cryoSPARC master instance run as root or with some setuid privileges, because this has the potential to introduce serious security holes in a cluster system, and would also require root sysadmin access to install on clusters that are not owned by a lab.

It would be great to get your thoughts on the above, especially whether or not the new setup that we are building will serve your needs. One nice thing about this setup is that the same cryoSPARC master instance in a big lab will be able to manage projects and jobs for all the users, and submit them seamlessly to cluster nodes or workstations or rackmount servers as required from the same web interface.

apunjani · November 25, 2017, 1:37am

@olibclarke @DanielAsarnow @jelka @rdrighetto or anyone else using cryoSPARC on a cluster, your input would be welcome as well, thanks!

jhm13c · December 7, 2017, 10:58pm

I am trying to set a cryoSPARC instance that will run in a cluster. So far, the biggest problem is the connection to the license server. Some clusters have limited accesses therefore communication with the server is a bit troublesome.

apunjani · December 7, 2017, 11:00pm

Thanks @jhm13c - cryoSPARC v2 will only require a connection from the master node rather than the worker nodes.

jhm13c · December 7, 2017, 11:18pm

That is great news, is there a time line for the v2?

adesgeorges · January 4, 2018, 10:23pm

Sounds ok to me but it would be so much better to have a single install for the whole cluster and each user just having his/her own database/data/license file in his home dir. That way, single users don’t have to install by themselves and it is probably more compatible with diverse systems and cloud solutions. And the program is independent from the data.
For instance, we are using sbgrid and this does not seem compatible with it.

Let me know what you think.

smsaladi · February 8, 2018, 10:17pm

I just came across this… If v2 implements this plan, it would certainly be a great improvement in usability and would ease of user adoption.