Huge Database Size and Migration to Multiple Instances?

Thanks for your post @dirk.

A dedicated master node is a good idea. Your experience with the current user base and hardware should be a good guide of how much RAM and CPU resources you need, as long as you allow some room for growth. What would be the use case for the

?

has already been answered by yourself:

Splitting up your instance is a plausible approach, particularly if you pick good criteria along which to split the instance. Some ideas:

  1. People who need to share projects need to have CryoSPARC logins on the same instance.
  2. Instance could be split based on project lifecycle:
  3. Older, inactive projects could be hosted on a dedicated “legacy” instance that does not have any attached workers. Such projects could be in archived state. Such a “legacy” instance’s database would have be backed up after all relevant inactive projects have been added, but due to the immutability of the projects additional database backups would not be needed. A large database on such a “legacy” instance would therefore reduce the administrative burden, compared to a large database on an “active” instance (below).
  4. An “active” instance’s database would be backed up frequently. The “active” instance would be kept small and agile by:
    * detachment of inactive projects, their removal from the database and, possibly, transfer of the detached project directory to a “legacy” instance.
    • database compaction after some data have been removed from the database
    • application of the data cleanup tool to active projects. Careful:
      1. Do not blindly accept the tool’s default settings
      2. Understand the difference between final and completed jobs
  5. One could combine criteria for splitting a large instance.
  6. Create and manage multiple CryoSPARC instances efficiently. For a discussions on what works and what doesn’t, see, for example
2 Likes