How do the data cleanup tools deal with jobs linked from other workspaces?

Hi,

I love the new cleanup tools in 4.3!! Being able to mark jobs/branches as final and then delete/clear everything else makes cleanup dramatically less tedious. :sweat_smile:

I just have one query - how does the cleanup handle jobs linked from other workspaces?

E.g. letā€™s say I have linked a job J110 from W1 to W2. In W2, I mark a few jobs as final, but J110 is not among them, nor an ancestor to them.

If I delete/clear all non-final jobs in W2, will J110 be deleted/cleared? Or just unlinked from the workspace? (Apologies if this is already covered in the docs!)

Cheers
Oli

1 Like

Hi @olibclarke ,

Thanks for your feedback! Iā€™m glad youā€™re enjoying the new data cleanup tools.

Linking a job to a workspace means the job will be treated as a part of the workspace by data cleanup tool. If the data cleanup tool is used on a workspace with clearing/deleting all non-final jobs selected, all jobs in the workspace not marked as final or ancestor of final, including linked jobs, will be cleared/deleted. In your example, J110 would be cleared/deleted, not unlinked.

Iā€™ll make a note to clarify this in the docs.

1 Like

Hi @nwong

Thanks for the info - Iā€™m not 100% sure this is the right approach though - primarily because it is different from what is used when deleting a workspace.

If I delete a workspace, it explicitly states that jobs that are shared with other workspaces will not be affected - they are just unlinked but otherwise remain intact in the other workspaces to which they belong.

I would suggest that for data cleanup, just unlinking jobs that are shared with multiple workspaces would be safer (less chance of accidental data loss) than actually deleting them (which might result in me later realizing that one of the linked jobs deleted was part of an important processing stream in another workspace). This would also be simpler, as it wouldnā€™t require the user to manually check whether linked jobs are important in other workspaces.

Also, a job that is part of a ā€œfinalā€ workflow in one workspace may be present in another workspace where it is not part of the ā€œfinalā€ workflow in that workspace.

At the least, I would suggest to have an option to treat linked jobs in this manner, and to explicitly state what happens to linked jobs in the GUI, in the same way as is currently done when deleting a workspace.

Does that make sense?

Cheers
Oli

1 Like

Hi @olibclarke,

Thanks for your feedback!

If I delete a workspace, it explicitly states that jobs that are shared with other workspaces will not be affected - they are just unlinked but otherwise remain intact in the other workspaces to which they belong.

This is definitely an inconsistency between the data cleanup tool and ā€œDelete workspaceā€ that we want to address. Ideally the cleanup tool would behave similarly when operating on a workspace with linked jobs not marked as final or ancestor of final in that those jobs would be unlinked instead of deleted as with deleting the entire workspace. Making this behaviour clearer in the cleanup dialog would also be important.

Also, a job that is part of a ā€œfinalā€ workflow in one workspace may be present in another workspace where it is not part of the ā€œfinalā€ workflow in that workspace.

Jobs marked as final or ancestor of final retain this status at the project level (across workspaces). This means that a job in W1 that is final/ancestor of final that is linked to W2 will still be final/ancestor of final in W2, and will be protected from clearing/deleting from the cleanup tool. In a project where all important results were marked as final, this would hopefully mean that a job linked to multiple workspaces that was used to generate important results would be an ancestor of a final job, and thus protected from clearing/deleting.

Weā€™re discussing this behaviour as a team and may make some changes in a future release.

1 Like

Jobs marked as final or ancestor of final retain this status at the project level (across workspaces). This means that a job in W1 that is final/ancestor of final that is linked to W2 will still be final/ancestor of final in W2, and will be protected from clearing/deleting from the cleanup tool. In a project where all important results were marked as final, this would hopefully mean that a job linked to multiple workspaces that was used to generate important results would be an ancestor of a final job, and thus protected from clearing/deleting.

Yes - this is true, but this assumes that one has gone through and marked the final jobs of all important workflows in the project across different workspaces, prior to initiating data cleanup, which I donā€™t think is necessarily going to be the case for all users. This is the reason I would suggest that cleanup should be non-destructive to anything that exists in another workspace (in the same way that workspace deletion operates).

2 Likes

Hi @olibclarke,

Thanks for your feedback! As of v4.4, the data cleanup tool has been updated:

When running the Cleanup Data tool on a single workspace within a project, jobs that are linked to other workspaces will no longer be deleted by the tool; instead those jobs will be unlinked from the workspace being cleaned and will remain present in the other workspaces.

2 Likes

Wonderful, this sounds much better, thanks @nwong!!