Load balancer out of sync when new schemas are created

I have…

  • [ x] Checked the logs and have provided the logs if I found something suspicious there

I’m submitting a…

  • [ ] Regression (a behavior that stopped working in a new release)
  • [x ] Bug report
  • [ ] Performance issue
  • [ ] Documentation issue or request

Current behavior

We currently run Squidex in an AWS Elastic Beanstalk environment with a multicontainer Docker setup; with a minimum of 2 instances running concurrently. Our Squidex application is served through a nginx reverse proxy on each instance, with an AWS application load balancer (ALB) in front of it & responsible for distributing the requests to each server.

For data storage we use a MongoDB database connected through an AWS VPC Peering connection. For file storage (images and other kinds of static assets) we have an AWS Elastic File System (EFS) setup that’s accessible from each Squidex instance. Currently there’s no Redis or any other kind of distributed caching setup for the Squidex instances.

When creating new schema fields we need to restart Squidex, else the schema won’t be created on one of the servers which will cause errors when you try to fetch content that does not “exist” on one of the servers.

Expected behavior

We should not need to restart Squidex to sync the schemas to the different servers

Minimal reproduction of the problem

Environment

  • [x ] Self hosted with docker
  • [ ] Self hosted with IIS
  • [ ] Self hosted with other version
  • [ ] Cloud version

Version: [VERSION]
Squidex version 4.0.0-beta1, as fetched from Docker Hub.

Browser:

  • [ x] Chrome (desktop)
  • [x ] Chrome (Android)
  • [x ] Chrome (iOS)
  • [ x] Firefox
  • [x ] Safari (desktop)
  • [ x] Safari (iOS)
  • [ x] IE
  • [ x] Edge

Others:

Hi, you have to enable clustering:

Set the value to MongoDB

1 Like

Thank you for responding so quickly, will try it out

Hello again. We enabled clustering but it does not seem to work and now we get a lot of these errors

{
  "logLevel": "Error",
  "message": "Exception during Silo.Start",
  "eventId": {
    "id": 100439
  },
  "exception": {
    "type": "Orleans.Runtime.MembershipService.OrleansClusterConnectivityCheckFailedException",
    "message": "Failed to get ping responses from 1 of 1 active silos. Newly joining silos validate connectivity with all active silos that have recently updated their \u0027I Am Alive\u0027 value before joining the cluster. Successfully contacted: []. Failed to get response from: [S172.17.0.3:11111:319279972]",
    "stackTrace": "   at Orleans.Runtime.MembershipService.MembershipAgent.ValidateInitialConnectivity()\n   at Orleans.Runtime.MembershipService.MembershipAgent.BecomeActive()\n   at Orleans.Runtime.MembershipService.MembershipAgent.\u003C\u003Ec__DisplayClass26_0.\u003C\u003COrleans-ILifecycleParticipant\u003COrleans-Runtime-ISiloLifecycle\u003E-Participate\u003Eg__OnBecomeActiveStart|6\u003Ed.MoveNext()\n--- End of stack trace from previous location where exception was thrown ---\n   at Orleans.Runtime.SiloLifecycleSubject.MonitoredObserver.OnStart(CancellationToken ct)\n   at Orleans.LifecycleSubject.\u003COnStart\u003Eg__CallOnStart|7_0(Int32 stage, OrderedObserver observer, CancellationToken cancellationToken)\n   at Orleans.LifecycleSubject.OnStart(CancellationToken ct)\n   at Orleans.Runtime.Scheduler.AsyncClosureWorkItem.Execute()\n   at Orleans.Runtime.Silo.StartAsync(CancellationToken cancellationToken)"
  },
  "app": {
    "name": "Squidex",
    "version": "4.1.0.0",
    "sessionId": "a98bdefe-f87a-45bf-b5f5-c92160de621c"
  },
  "timestamp": "2020-02-13T08:58:46Z",
  "category": "Orleans.Runtime.Silo"
}

It seems that the members cannot communicate properly, I cannot say that much about your environment, though.

Hi again.

Looking at our logs it appears that Squidex tries to use the Docker container’s private IP address (172.17.0.3) when configuring Orleans. However, since our two Squidex instances are running on two separate EC2 instances (i.e. hosts) those IPs aren’t reachable from one another.

We’ve already verified that both of our EC2 instances can communicate with each other on their private IPs (10.x.x.x) on port 11111. So what we need to do is modify our Squidex configuration so that they use the hosts’ IP addresses, instead of the containers’ IP addresses.

Looking through the source code for Squidex it appears that the IP address is set in https://github.com/Squidex/squidex/blob/7a31b132692cd9447f0093e70f15780292f3247a/backend/src/Squidex/Config/Orleans/OrleansServices.cs on line 79:


var address = Helper.ResolveIPAddressAsync(Dns.GetHostName(), AddressFamily.InterNetwork).Result;

Since we don’t want to actually use the container’s IP (i.e. Dns.GetHostName()) this is the part that doesn’t work with our current setup.

Do you know of any way of manually setting the IP (e.g. passing it in as an environment variable or through appsettings.json), or would it be necessary to fork the Squidex source code and modify how it can set the IP address when configuring Orleans?

You need to fork or provide a PR. This is a very uncommon scenario. Usually everybody else uses docker swarm or kubernetes for that.

I am not sure if there is an automated way to get the host IP address from within the container.

Something like this: https://github.com/Squidex/squidex/commit/8287340a84c99c18cef98af42bae128c56eeec6d

Have not tested it yet.

Hey thanks again, that worked just perfect. You can mark this as solved :slight_smile:

1 Like