Clean up Events2 collection

I have…

I’m submitting a…

  • [ ] Regression (a behavior that stopped working in a new release)
  • [ ] Bug report
  • [x] Performance issue
  • [ ] Documentation issue or request

Current behavior

Events2 collection is growing

Expected behavior

Hi Sebastian,

Our Events2 collection is growing. is there a proper way to clean it?

From this old post, it seems like it is not possible. However, I wonder if we can do it through mongodb commands.

[IMPLENTED] Permanent deletion:

Minimal reproduction of the problem

Environment

App Name:

  • [x] Self hosted with docker
  • [ ] Self hosted with IIS
  • [ ] Self hosted with other version
  • [ ] Cloud version

Version: [VERSION]

Browser:

  • [x] Chrome (desktop)
  • [ ] Chrome (Android)
  • [ ] Chrome (iOS)
  • [ ] Firefox
  • [ ] Safari (desktop)
  • [ ] Safari (iOS)
  • [ ] IE
  • [ ] Edge

Others:

No, never ever do that.

I had also wondered about this and came across these FAQs about event sourcing that helped clarify things for me a bit: https://www.cqrs.nu/Faq/event-sourcing

If size ever became an issue and you were fully happy with the current state I guess you could flatten it out by creating a new app and migrating over content? Or like you linked above now we can permanently delete content you could clone each bit of content and then permanently delete the old version? You’d lose all history and other benefits of event sourcing but it would be a lot smaller for a while!

If you have a look at a single content the events could look like this:

  • ContentCreated
  • ContentUpdated
  • ContentUpdated
  • ContentUpdated

So we could remove the two ContentUpdated events in the middle. But this is nothing you can do very easily.

If you want to remove it you can clone your app with the sync tool and then delete the old app, which can be done permanently.

But if you just delete events by hand you will loose the ability to update Squidex, because when the content collections are changed we usually just replay all events to recreate the collection.

1 Like

Funnily enough this has actually come up as a point of discussion in our team today!

You haven’t said it is impossible which intrigues me, where do the complications lie in this in your opinion? Is it being sure that the only impact that the (using your example) two middle ContentUpdated events had was on the object itself and not somehow triggering a rule and updating something else?

Think I probably need to go off and do some more reading around this!

The ContentUpdated events contain the full content data, therefore - if you are only interested in the last result, you could remove it (if there is no other event in between).

I built an export and import process to handle such scenarios… its in c# and dot net core… if you think it could be useful, happy to share - obvious disclaimers… it may not be perfect and I’m still on Squidex v4

I had to also account for content referencing other content via their Squidex ID’s, which change on import.

What exactly are you doing there?

A quick summary would be:

Export process is essentially:

  • extract the backup file to access the events and attachments folders
  • read through all the json files and group by id
  • remove any id’s where the last entry is a delete
  • for the remaining id’s find the last published event
    – check forward to see if there were any update events after the publish with actual data
    – check backwards to the most recent update that contains actual data
    – save the json file to output folder

Import process is essentially:

  • read through all the json files
    – pull out schema, id, data
    – post to /api/{appname}/{schema} url the data to create the new content
    – save the new content id, original content id along with the data to a map
  • search through the map to see if any content data contains an original content id
    – update data with the new content id and make a put call to the /api/{appname}/{schema}/{new content id}
1 Like

So I think we should still think about a way to handle this. Our even2 is like 500M already. And it takes like 1 year. I think this will be a problem eventually.

I wonder if there is a way we can remove events older than X days.

500 million? How do you create so many entries? What is your workload?

I meant 500M byte. The problem is it will keep growing…

I’d be interested to hear how big the collection is for the cloud version! Could give us more confidence that we don’t need to worry about it aside from just ensuring we have enough space for it to grow.

If we do start running low on disk space does Squidex protect itself by automatically entering readonly mode or something?

A few GB, I do not have the number right now, as a I am sitting in a train.

How shall it do that? The information is not available from outside of MongoDB. The writes would just fail. I just observe disk space with monitoring and once it reaches 70% I will double disk size.

Mongo Atlas can be configured to do something like this automatically.

I guess I can build something like this, but there are only a few event types where it would make sense. e.g. 2 consecutive ContentUpdate events.

1 Like

Ah thanks for the tip about MongoDB Atlas Auto-Scaling, will ask around and see if we have plans to do that, or if we’ve already allocated more disk space than we will ever need.