Safe Squidex deployments for Load Balanced environments

Sorry this has turned into a bit of an essay with quite a few different points in, I’d really appreciate you taking the time to read it but can understand if you want to hold off and enjoy your Friday or want me to break it down into more granular support tickets?

We are currently a bit cavalier about our deployments so we are looking to polish them up a bit to make them safer rather than waiting for something to go wrong. We are currently on v6.9.0 but are looking to move to v7.x in the near future.

There are a couple of bits I have been reading:

https://docs.squidex.io/02-documentation/concepts/migrations

I just wanted to check if what I am proposing is viable or dangerously wrong.

Current situation (Squidex v6.x)

We are in a load balanced environment resulting in two instances of Squidex with clustering enabled.

When we release we perform the following steps:

  1. Take Leg 1 out of the balance

  2. Deploy changes to Leg 1

  3. Test Leg 1

  4. Assuming everything is OK with Leg 1 put Leg 1 back in balance at same time as taking Leg 2 out of the balance

  5. Deploy changes to Leg 2

  6. Assuming everything is OK with Leg 2 put Leg 2 back in balance

Deploying changes can include the following:

  • A change to the version or configuration of Squidex

  • A change to the structure of a Squidex App (e.g. schema modifications) (Note: we only do these changes during Leg 2 deployment as it can apparently cause issues with clustering otherwise)

  • A change to the contents of a Squidex App (Note: these are usually dependent on schema changes so also are only made during Leg 2 deployment)

Currently both legs share a database and we are not setting readonly mode to true, hence being a bit nervous about our current deployments.

Is there somewhere we can check the Version in the Migration MongoDB collection before we deploy a new Squidex release so we know if migrations will need to happen or not? For example we are currently on v6.9.0; would there be any migrations (i.e. is it safe to just release with our current setup) if we were to upgrade to v6.14.0?

Readonly mode

I was wondering if you think this is only needed when we are changing the version of Squidex or if it is also advised when you are potentially changing the structure of a schema? As they currently share a database if I start testing Leg 1 with changes that amended a schema and someone is editing content for the old version of that schema in Leg 2 might that cause problems?

Is there a reason for having to set this before the application starts up or would having an API endpoint to toggle readonly mode be viable? (if so I will raise a feature request)

Checking migrations have completed

I was looking at the Migration collection in MongoDB and am assuming that we would ideally be checking to see if any migrations are still taking place (i.e. IsLocked: true) before we disable readonly mode and continue with the release?

Backup of the database

This is also something we are not doing as part of release process (don’t worry we do have a regular backup process in place for disaster recovery) and I am struggling to see a way of automating it, but as it is best achieved with mongodump it is a seperate issue for us to get sorted as it is not directly related to Squidex.

7.x

We will now need a worker node to coordinate changes between our two load balanced instances.

My understanding is that you should only ever have one Worker node and you always need a Worker node active even if you just have one Squidex instance.

I believe this means during normal operation we should have:

  • Leg 1: Squidex Instance 1 (Active) & Squidex Worker Instance 1 (Active)

  • Leg 2: Squidex Instance 2 (Active) & Squidex Worker Instance 2 (Inactive)

So I think our release process would need a few extra steps:

  1. Enable Squidex Worker Instance 2 when Leg 1 is taken out of the balance

  2. Ensure Squidex Worker Instance 1 only targets Squidex Instance 1 and Squidex Worker Instance 2 only targets Squidex Instance 2

  3. When putting both Legs back into the balance we need to disable Squidex Worker Instance 2 and ensure Squidex Worker Instance 1 targets both targets Squidex Instance 1 and Squidex Instance 2

I am not at all confident in what I have just laid out so would greatly appreciate any feedback or elaboration around it!

CONTENTS__OPTIMIZEFORSELFHOSTING

We are self hosting with currently just the one app per Squidex instance with tens of thousands pieces of content in various schemas so this sounds like something we should be using but I am unsure of the advantages and disadvantages; is the only way to check this to create a testing instance with it enabled and see how much bigger the database is and how it performs or is there a rough way of estimating how much it could impact speed of reading and writing?

No, I could add an API for that, but it does not help you for now.

I was wondering if you think this is only needed when we are changing the version of Squidex or if it is also advised when you are potentially changing the structure of a schema?

It is only needed when Squidex makes a change to the database.

Technically it is possible with an API, but I am not sure if I want to invest time in that.

I was looking at the Migration collection in MongoDB and am assuming that we would ideally be checking to see if any migrations are still taking place (i.e. IsLocked: true) before we disable readonly mode and continue with the release?

Yes, exactly.

I use my custom image for that:

We will now need a worker node to coordinate changes between our two load balanced instances.
Not for coordination, just to not run multiple workers. You also declare one of your 2 instances as worker. But right now it must be exactly ONE worker.

Yes, but the worker can also serve HTTP requests, therefore worker mode is on by default.

It does not matter if the worker is offline for some time, just turn if off before deployment, then deploy your normal servers and then redeploy your worker.

is the only way to check this to create a testing instance with it enabled and see how much bigger the database is and how it performs or is there a rough way of estimating how much it could impact speed of reading and writing?

Writes are slower by 100% (double the writes), but reads can be significantly faster.

1 Like

Thanks so much for the quick and informative response!

I think that makes it far easier for us, we just need a decent process in place to toggle worker mode in production.

We have had situations where we’ve had to be on a single leg overnight, what would the impact be if we were just on the leg without a worker for that time? Hopefully not something we’d have to do as like I said above we’ll hopefully have an easy way of enabling worker mode for that leg until we can get them both back in the balance.

Now I read my proposal gain I think it is far easier for our setup to use the environment variable method of toggling worker mode so unlikely we would use the API call anyway.

Thanks for confirming this, I think I will propose it to colleagues and hopefully they’ll agree as we get read far more than we get written to.

All the background processes would just stop and then continue where they have left:

  • Rules
  • Full Text Indexing
  • History Service
  • Deletion Processes
  • Restores
  • Backups
1 Like

Hello, thanks again for answering my questions around this. But there’s another one:

While reviewing my deployment plan for Squidex v7.1 to a load balanced environment my colleagues asked the question of “what if you have two worker nodes active at the same time?” and I had no answer! My gut feeling was that with multiple workers there is a chance that something could get processed twice causing issues, but then we weren’t sure how that could work at scale and for disaster recovery if that worker node goes down. We’re wondering how do you handle this for the Squidex cloud?

If you have two workers at the time the outcome is unexpected. For example you would have two event consumers, which are dependent to run in a particular order.

I still have not deployed 7.1. to the cloud yet, but I would just create a normal kubernetes deployment. You can configure how to deploy, e.g. 1 - 2 - 1 or 1 - 0 - 1 (instance count).

1 Like

Ah so it would be similar to the problem we faced with Azure Service Bus before we utilised it’s sessions functionality; we had two instances of an application processing messages without any consideration as to the order so some were attempting to process before they should causing errors or invalid data.

Thanks for clarifying that and the tip around deployments.

1 Like