Feature request: managing files saved in the mongodb GridFS

It seems that the images displayed in the web interface are all saved in the mongodb in its GridFS. Also after a job is deleted the image files (PNG and PDF) are not removed from the GridFS. As a consequence the database keeps growing over time. For instance, my database is now 22 GB in size, after only 2000 jobs. In the DB, the fs.chunks collection uses 21 GB. From the information in fs.files I can see that most of them are not really worth keeping.

Is it possible to provide a database cleaning function? Or even better, is it possible to save the intermediate files in the job directory so that they can be managed as regular files?

Zhijie

Hi @ZhijieLi,

Please take a look at this post RE: Database cleaning: Data Management for Large Instances

Also, you are correct about thumbnails and meta images being saved in GridFS, but when a job is deleted, all related GridFS files are removed. See clear_job@cryosparc2_master/cryosparc2_command/command_core/__init__.py :

# remove any files in gridFS
all_fs = list(mongo.db['fs.files'].find({'project_uid': project_uid, 'job_uid': job_uid}, {'_id':1}))
for streamfile in all_fs:
    gridfs.delete(streamfile['_id'])

Please feel free to cross-reference any deleted jobs against any existing files in GridFS, and let us know if the function is not behaving as expected!

Hi Stephan,

Thanks for the explanation. You are right, the file chunks are indeed deleted when a job is deleted.

I was fooled by the observation that the database file didn’t change size after a job deletion - that’s simply because it would be very expensive to shrink the GridFS DB file after each deletion.

I am looking forward to the new versions!

Zhijie

1 Like