Hi all,
I am wondering if in the future we might be able to run cryoSPARC on a machine that does not have the MongoDB installed locally? The thought is we would have a MongoDB database server and point cryoSPARC at it for its database.
The rationale behind this question is that we are trying to think of ways labs or individuals could run cryoSPARC but not need persistent storage on any given host in a SLURM cluster. All the home directories and labs use NFS for their storage and I don’t think putting MongoDB on NFS is a good idea.
I think the cryosparc master(python and javascript part) is tightly connected to the Mongo DB server. Besides performance considerations, it might be too significant a change to be made at this point if the mongoDB has to run on a different node from the one that runs the master programs.
The current way cryosparc system configuration allows the cryosparc_master installation (programs only, including the mongoDB programs) to be installed and run on a computer that is separate from everything else. That means that the mongoDB database files(cryosparc_database), the project dirs and the GPU workers can be anywhere on the network. However, for better performance, it is best to put the mongoDB database (cryosparc_database dir) on a fast SSD, on the same computer as the master, due to its frequent random accesses.
One solution I can think of for your problem is that the users/labs can set up their own cryosparc masters on their own computers. Hence by default, they will have their mongoDB database saved on their own computers too. Their projects and data can be saved on the network and can be accessed through the NFS. The masters are allowed to send SSH commands to the cluster nodes to initiate jobs and get results. The worker nodes will use the data and write results to the project dirs through NFS.
Basically, the cluster can be regarded as a collection of network storage and GPU workers that the users’ cryosparc masters can use.
There might be some security concerns. But the SSH commands cryosparc sends are quite simple. It should not be hard to verify that they are legitimate cryosparc commands.
We are not currently planning to implement running mongod and cryosparc_master processes on separate hosts.
Not necessarily.
For mongodb v3.6 + the WiredTiger storage engine, which CryoSPARC v4.3.1 uses, nfs mount options are documented
For the current version of mongodb, you can find, with respect to nfs, both avoid and, again, mount options.
Did you consider hosting cryosparc_master software, along with the database, on an off-cluster server? It is possible to host multiple CryoSPARC instances on a single computer if you ensure non-overlapping