Webhook and History stopped working for one schema and deleting some records

We are running mongo version 4.2.19 and we’re self hosting using IIS in Docker.

We are running a custom plugin to push data to the Azure Service Bus via rules, I’m not sure if this is related at all but that’s the only difference I can think of.

It should not be related at all.

Hello, I have a theory around this and I am not sure how to prove it, but perhaps it could help us understand how our data gets into this odd state.

We work in a shared environment with multiple deployments happening throughout the day. We find Squidex takes ten minutes to start up, and some times we are creating or deleting lots of data which gets processed by Squidex’s background processes, so is there a chance a deployment halfway through either start up or processing of changes that the data could become corrupted?

For context the Body field that was mentioned by @RobDearling is a large text field so could there be a stream or something handling this long string that gets interrupted halfway through leaving it in an invalid state?

It is very unlikely.

For every change you have basically 2 update operations:

  1. Add an event to the event stream.
  2. Update the snapshot record in the database.

Why is your theory so unlikely?

  1. The snapshot has never been corrupted, only the event.
  2. The corruption of the event is not at the end but somewhere in the middle of the event.

I do not use the normal mongodb serialization to write the event to the actual event stream. I serialize it to a string first and then add this string to the mongodb event:

The reason for that are the different event store providers that are supported and the problem that some events are not compatible with MongoDB (not all JSON property names are valid MongoDB property names).

This string is corrupt in a very weird way. But I cannot reproduce it. It never happened in the Squidex cloud and when I tried to serialize a snapshot (that is the closest that I could get from you) the string was always valid. It could be a bug in the serializer but I have no idea how to reproduce it. Nobody has sent me something yet, that I could have used to reproduce it.

EDIT: Another idea would be a concurrency issue, e.g. while the event is serialized something makes a change to this event. But I have no idea what it could be.

EDIT 2: I am really worried about this as well and would like to fix it ASAP, but I have not receive the necessary support yet.

1 Like

Hi @Sebastian This issue keeps happening and already happened today again. It is very difficult to find which Entity and which ID is the culprit. I open records from past few hours and keeps clicking save button one by one. The one fails to save during this process is mostly the culprit and then I clean events for that ID by directly going to MongoDB.
Can you also log the Entity/Schema name along with ID in the error message so we can easily find the records which data got invalid so we can clean and recover in case it happens again. This is interim solution till we can find the actual issue and fix.

Current message doesn’t have schema name or Id.
Squidex.Infrastructure.Json.JsonException: Unexpected character encountered while parsing value: i. Path ‘data.HandbookSections.iv[1276].Index’, line 1, position 2124352.
—> Newtonsoft.Json.JsonReaderException: Unexpected character encountered while parsing value: i. Path ‘data.HandbookSections.iv[1276].Index’, line 1, position 2124352.

I can only repeat what I have written in the last message. I cannot reproduce it and nobody has provided me a backup or so.

@pankajv82 can you use the dynamic contents client to retrieve the problem content using just it’s GUID?

Afaik yes, the content is saved fine in the Mongo collection but only the event is broken.

@Sebastian Are you open to have screen share session so we can show you the broken system? Unfortunately, we are not allowed to give DB back-up. You might have to sign small data-privacy agreement with our company.

I don’t see what information I could get from a screenshare session. I need something to reproduce it, so I can debug through it and make code changes to find the root cause. If you want me to sign something it is okay for me.

Hi @Sebastian, I wrote a quick and simple program which leads to Data Currption issue after running it few times. I am emailing you the same. We can go over the steps via screen share to run the same if you are not able to run the program with help of instructions. You can point to any of your local squidex and might be easy to find the root cause. Please let me know how it goes.

1 Like

Thats awesome. I will have a look tomorrow. It is already 10PM over here.

1 Like

Thank you very much for the sample. I have run it a few times already and cannot reproduce it yet.

There are a few interesting things:

  1. It is very slow on the Squidex side, so I can use this sample to dig into the performance issues.
  2. The document structure does not change. Basically you just clone some of the array items and it becomes bigger and sooner or later it crashes. If it would be a serialization problem it should have happened with the first run.
  3. The document size changes over time and the JSON field becomes bigger, perhaps we have reached some threshold after which mongo db does something differently. What is your max document size?

About the Serialization Problem

Do you have access to the Squidex code? We can remove this potential problem from our list with a simple test.
We could add a line to the DefaultEventDataFormatter object above and deserialize the string again just to be sure. So you could add the following code after line 81:

JsonObject.Parse(payloadJson);

Your Mongo Database

Where do you host MongoDB? On Windows? Can you try a free atlas setup and test if you can reproduce it there? For me it still sounds like a hardware issue.

I was able to reproduce after running few times. I think, its concurrency issue with large data-set like what I have. Can you try running few more times. You can increase count of records to 100 and then try. Initial code creates only 25. Please change numbers in loop to have at least 100 records. I get error while updating record. I tried to take back-up but its more than 1.6 GB due to lots of events. Since this App have only mock data, we can even share this back-up also.
Squidex is hosted in docker container for us so I don’t have control of its source code.

If you can’t reproduce then let me know place where I can give you back-up.

I have run it over 15-20 times now. Do you know how fast an update is on your dev environment? Perhaps this is a difference, because for whatever reason, it is pretty slow.

I don’t say no, but your sample creates no concurrency. You loop over the items and then you make several updates per content item. But all items are executed in a single thread per content item (Orleans Actors), so there is no concurrency.

Let me schedule a meeting for tomorrow so we can show you. I can easily reproduce in my machine. We will also share the DB back-up.

Where have you hosted MongoDB?

Its self Hosted within our network.