Huge Database Size and Migration to Multiple Instances?

wtempel · September 29, 2023, 6:36pm

Thanks for your post @dirk.

A dedicated master node is a good idea. Your experience with the current user base and hardware should be a good guide of how much RAM and CPU resources you need, as long as you allow some room for growth. What would be the use case for the

?

has already been answered by yourself:

Splitting up your instance is a plausible approach, particularly if you pick good criteria along which to split the instance. Some ideas:

People who need to share projects need to have CryoSPARC logins on the same instance.
Instance could be split based on project lifecycle:
Older, inactive projects could be hosted on a dedicated “legacy” instance that does not have any attached workers. Such projects could be in archived state. Such a “legacy” instance’s database would have be backed up after all relevant inactive projects have been added, but due to the immutability of the projects additional database backups would not be needed. A large database on such a “legacy” instance would therefore reduce the administrative burden, compared to a large database on an “active” instance (below).
An “active” instance’s database would be backed up frequently. The “active” instance would be kept small and agile by:
* detachment of inactive projects, their removal from the database and, possibly, transfer of the detached project directory to a “legacy” instance.
- database compaction after some data have been removed from the database
- application of the data cleanup tool to active projects. Careful:
  1. Do not blindly accept the tool’s default settings
  2. Understand the difference between final and completed jobs
One could combine criteria for splitting a large instance.
Create and manage multiple CryoSPARC instances efficiently. For a discussions on what works and what doesn’t, see, for example
- multiple master installations on same worker:
  - Multiple cryoSPARC Master Installations on one Computer
  - Multiple master nodes and single cluster deployment
- share worker installations between multiple instances (at the same software version): Licence key mismatch
- docker, kubernetes, singularity, podman have been mentioned on this forum and may play a role in efficiently managing multiple CryoSPARC instances.