Massive performance issues through massive database?

david.haselbach · May 18, 2022, 12:05pm

Hi,

Recently, we have sudden performance issues with our cryosparc instance on campus and things are slow and unresponsive and many jobs fail without obvious reasons. However, since March our number of jobs also tripled to 1500 jobs per month and also our userbase grew significantly. The reason behind is that we now have a Krios on site that feeds suddenly many projects. Also, our mongodb seems rather huge with roughly 500 GB. Also we have 415 projects in the instance and most of them are several TB big and some of them contain thousands of jobs. We can entirely imagine that this is the problem and thus seek your advice.

Best,

David

wtempel · May 30, 2022, 3:27pm

@david.haselbach
A cryoSPARC instance may grow to a large size after lots of usage. For example,

the single database that handles an instance’s configuration and metadata may grow to many hundreds of gigabytes
the combined master workload (metadata and data management, interactive jobs, etc.) may exceed a single server’s capacity

It may be time to “split” the instance. How to best divide a large instance depends on the circumstances at your institution and on your instance’s projects.
Constraints:

each instance has its own master process(es)
the port ranges of instance’s on the same host must not overlap
therefore: an instance must be assigned a unique url (a combination of <host>:<baseport>)
a project directory cannot be assigned to more than on instance
each instance has its own database
the server hosting one or more cryoSPARC master(s) must be able to handle the combined workload of interactive jobs that must run on the given master host

Suggestions:

aim for a division that assigns users that will collaborate on cryoSPARC projects to the same instance
keep in mind that a project directory cannot be assigned to more than one cryoSPARC instance at any given time.

Plausible demarcations for the split could be:

cryoem researchers in a small lab
a single user who works an multiple cryoem projects

In addition or, under some circumstances, as an alternative to splitting up a large cryoSPARC instance, one might consider archiving data that no longer need to be accessed through or by the cryoSPARC instance. Because cryoSPARC project directories are “portable” (i.e. they contain metadata that facilitate their import to another cryoSPARC instance, or re-import to the same instance), one can shrink an instance’s data volume with this sequence:

create an archival copy of the project directory (ensure all links a dereferenced!)
delete the project from the cryoSPARC instance (deletes data from the filesystem and from the database)
reclaim filesystem space no longer needed by the shrunk database (see a related discussion)
[import the archival copy to this or another cryoSPARC instance if/when needed]

Space usage can also be reduced by deleting intermediate results.

James · December 13, 2022, 11:59am

Hello there, I am new here, I saw that message. We are a archiving specialist (Benelux) and really wonder what is needed and happend with all the generated data. I can inmagine that data must be moved from the productionsite to avoid and to prevent congestion in the storage systems with the associated performance loss.specific requirements (e.g. due to legislation)? We do archiving projects in collaboration with the CARE market.So I’m researching how we can match solutions in this market that we already have. We also have super fast FileSystems for processing (WEKA).