Clean up Events2 collection

maxisam · September 19, 2022, 9:47pm

I have…

[x] Read the following guideline: https://docs.squidex.io/01-getting-started/installation/troubleshooting-and-support. I understand that my support request might get deleted if I do not follow the guideline.

I’m submitting a…

[ ] Regression (a behavior that stopped working in a new release)
[ ] Bug report
[x] Performance issue
[ ] Documentation issue or request

Current behavior

Events2 collection is growing

Expected behavior

Hi Sebastian,

Our Events2 collection is growing. is there a proper way to clean it?

From this old post, it seems like it is not possible. However, I wonder if we can do it through mongodb commands.

[IMPLENTED] Permanent deletion:

Minimal reproduction of the problem

Environment

App Name:

[x] Self hosted with docker
[ ] Self hosted with IIS
[ ] Self hosted with other version
[ ] Cloud version

Version: [VERSION]

Browser:

[x] Chrome (desktop)
[ ] Chrome (Android)
[ ] Chrome (iOS)
[ ] Firefox
[ ] Safari (desktop)
[ ] Safari (iOS)
[ ] IE
[ ] Edge

Others:

Sebastian · September 20, 2022, 8:39am

No, never ever do that.

slalFe · September 20, 2022, 9:31am

I had also wondered about this and came across these FAQs about event sourcing that helped clarify things for me a bit: https://www.cqrs.nu/Faq/event-sourcing

If size ever became an issue and you were fully happy with the current state I guess you could flatten it out by creating a new app and migrating over content? Or like you linked above now we can permanently delete content you could clone each bit of content and then permanently delete the old version? You’d lose all history and other benefits of event sourcing but it would be a lot smaller for a while!

Sebastian · September 20, 2022, 9:43am

If you have a look at a single content the events could look like this:

ContentCreated
ContentUpdated
ContentUpdated
ContentUpdated

So we could remove the two ContentUpdated events in the middle. But this is nothing you can do very easily.

If you want to remove it you can clone your app with the sync tool and then delete the old app, which can be done permanently.

But if you just delete events by hand you will loose the ability to update Squidex, because when the content collections are changed we usually just replay all events to recreate the collection.

slalFe · September 21, 2022, 12:54pm

Funnily enough this has actually come up as a point of discussion in our team today!

You haven’t said it is impossible which intrigues me, where do the complications lie in this in your opinion? Is it being sure that the only impact that the (using your example) two middle ContentUpdated events had was on the object itself and not somehow triggering a rule and updating something else?

Think I probably need to go off and do some more reading around this!

Sebastian · September 21, 2022, 2:50pm

The ContentUpdated events contain the full content data, therefore - if you are only interested in the last result, you could remove it (if there is no other event in between).

Russell_McGinnis · September 26, 2022, 3:53pm

I built an export and import process to handle such scenarios… its in c# and dot net core… if you think it could be useful, happy to share - obvious disclaimers… it may not be perfect and I’m still on Squidex v4

I had to also account for content referencing other content via their Squidex ID’s, which change on import.

Sebastian · September 26, 2022, 4:07pm

What exactly are you doing there?

Russell_McGinnis · September 26, 2022, 7:58pm

A quick summary would be:

Export process is essentially:

extract the backup file to access the events and attachments folders
read through all the json files and group by id
remove any id’s where the last entry is a delete
for the remaining id’s find the last published event
– check forward to see if there were any update events after the publish with actual data
– check backwards to the most recent update that contains actual data
– save the json file to output folder

Import process is essentially:

read through all the json files
– pull out schema, id, data
– post to /api/{appname}/{schema} url the data to create the new content
– save the new content id, original content id along with the data to a map
search through the map to see if any content data contains an original content id
– update data with the new content id and make a put call to the /api/{appname}/{schema}/{new content id}

maxisam · November 1, 2022, 7:30pm

So I think we should still think about a way to handle this. Our even2 is like 500M already. And it takes like 1 year. I think this will be a problem eventually.

I wonder if there is a way we can remove events older than X days.

Sebastian · November 1, 2022, 8:31pm

500 million? How do you create so many entries? What is your workload?

maxisam · November 3, 2022, 4:04am

I meant 500M byte. The problem is it will keep growing…

slalFe · November 4, 2022, 10:11am

I’d be interested to hear how big the collection is for the cloud version! Could give us more confidence that we don’t need to worry about it aside from just ensuring we have enough space for it to grow.

If we do start running low on disk space does Squidex protect itself by automatically entering readonly mode or something?

Sebastian · November 4, 2022, 12:11pm

A few GB, I do not have the number right now, as a I am sitting in a train.

How shall it do that? The information is not available from outside of MongoDB. The writes would just fail. I just observe disk space with monitoring and once it reaches 70% I will double disk size.

Mongo Atlas can be configured to do something like this automatically.

I guess I can build something like this, but there are only a few event types where it would make sense. e.g. 2 consecutive ContentUpdate events.

slalFe · November 7, 2022, 9:05am

Ah thanks for the tip about MongoDB Atlas Auto-Scaling, will ask around and see if we have plans to do that, or if we’ve already allocated more disk space than we will ever need.