2022-02-09 Downtime due to Database Crash

The database just crashed, I am about to investigate the problem.

Update

What I know so far: Two of the 3 mongodb instances got corrupt. I do not know yet and I am pretty sure that I will never know. By recreating the 2 instances I made a big mistake and have accidentally deployed an old version of mongodb, because I have picked up the wrong yaml file. It took a long time until the mistake was found and I had to start a repair of the remaining mongo instance to check for other potential corruption. I hope that when this is done, I can just connect the other 2 instances to the cluster and when they have synced everything it should be up and running again.

I will publish a more detailed postmortem later and will keep you up to date.

4 Likes

Thanks for letting the community know with such clarity and transparency … hope you can solve this soon. Alles gute !

1 Like

The repair process is completed now, the other members are joining the database cluster now and perform and initial sync of the data. Theoretically I could bring Squidex online now, but then we have the full load on one server and I think it is not the best idea right now.

How much time do you think it will take before Squidex will be live again?

I don’t know. There is no progress or so in the logs. Could take 2 more hours, but I am not sure

It has been completed. Actually faster than expected (not saying that it was fast).

I will write more about that tomorrow and send you the details via email. I will also give everybody a refund of 1 months, but I have to write a script for that first.